{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# CPSC 330 Lecture 4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Lecture plan\n",
    "\n",
    "- 👋\n",
    "- **Turn on recording**\n",
    "- Announcements\n",
    "- Cross-validation True/False poll (5 min)\n",
    "- Logistic regression intro (5 min)\n",
    "- `CountVectorizer` and feature preprocessing pitfalls (35 min)\n",
    "- Break (5 min)\n",
    "- Logistic regression: `predict_proba` (10 min)\n",
    "- Logistic regression: coefficients and interpretation (10 min)\n",
    "- Logistic regression with continuous features (10 min)\n",
    "- True/False questions (time-permitting)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning objectives\n",
    "\n",
    "- Apply logistic regression to classification problems\n",
    "- Correctly apply `CountVectorizer` to preprocess text data (avoiding common pitfalls)\n",
    "- Use `predict_proba` and interpret its output with a healthy dose of skepticism\n",
    "- Interpret the coefficients of a trained logistic regression model\n",
    "- Identify the key hyperparameter of `LogisticRegression` (namely, `C`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "plt.rcParams['font.size'] = 16\n",
    "\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.model_selection import train_test_split \n",
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "from plot_classifier import plot_classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Announcements\n",
    "\n",
    "- Add/drop deadline has passed - welcome to the course!\n",
    "- hw1 grades posted, hw2 deadline passed (expect for late joiners), hw3 will posted later today, hw2 solutions will be posted tomorrow.\n",
    "- Changes to class format.\n",
    "  - Watch parties cancelled due to very low attendance.\n",
    "  - We have Yuxi with us in lecture today.\n",
    "  - We'll try Zoom chat today and probably try Piazza Live Q&A on Thursday to see what works better.\n",
    "  - There are designated Q&A periods sprinkled throughout this lecture, where I will stop and answer questions.\n",
    "  - Yuxi will answer some questions in real-time as well.\n",
    "- For those who were added to the course recently:\n",
    "  - If you did not complete hw1, you can drop it as your lowest hw grade.\n",
    "  - You can have until 11:59pm tonight to complete hw2. \n",
    "  - You can still submit the syllabus quiz."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cross-validation True/False poll (5 min)\n",
    "\n",
    "https://piazza.com/class/kb2e6nwu3uj23?cid=178"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Logistic regression intro (5 min)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Motivating example \n",
    "\n",
    "- Sentiment analysis: predict the sentiment of text, such as a movie review.\n",
    "- Targets: positive 👍 and negative 👎\n",
    "- Features: words (e.g., *excellent*, *well* for 👍 and *boring* for 👎)\n",
    "\n",
    "<blockquote> \n",
    "    <p>Review 1: This movie was <b>excellent</b>! The performances were oscar-worthy!  👍 </p> \n",
    "    <p>Review 2: What a <b>boring</b> movie! I almost fell asleep twice while watching it. 👎 </p> \n",
    "    <p>Review 3: I enjoyed the movie. <b>Excellent</b>! 👍 </p>             \n",
    "</blockquote>  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Intuition behind a linear classifier\n",
    "\n",
    "- Learn coefficients (weights) associated with different features\n",
    "\n",
    "<img src='img/words_coeff.png' width=\"300\" height=\"300\" />\n",
    "\n",
    "- Use these learned coefficients to make predictions. For example, consider the following review $x_i$. \n",
    "<blockquote> \n",
    "    <p>It got a bit <b>boring</b> at times but the direction was <b>excellent</b> and the acting was <b>flawless</b>. </p> \n",
    "</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Linear classifier \n",
    "\n",
    "- $\\textrm{score}(review) = $ coefficient(*boring*) + coefficient(*excellent*) + coefficient(*flawless*) = $-1.40 + 1.93 + 1.43 = 1.96$\n",
    "- Since $score(review) = 1.96 > 0$, predict the review as positive 👍. \n",
    "- Components of a linear model\n",
    "    - input features\n",
    "    - coefficients (weights), one per feature\n",
    "    - bias or intercept"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Logistic regression\n",
    "\n",
    "In particular, we will focus on \n",
    "- use `fit`, `predict`, `predict_proba`\n",
    "- use `coef_` to interpret the model weights \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q&A\n",
    "\n",
    "(Pause for Q&A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `CountVectorizer` and the Golden Rule (35 min)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Let's train it on a dataset\n",
    "\n",
    "I have downloaded the [IMDB movie review dataset](https://www.kaggle.com/utathya/imdb-review-dataset) from Kaggle. You should be able to download it as well. I did not push it to the home repo because it is large and because I don't have permission to redistribute it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>review</th>\n",
       "      <th>label</th>\n",
       "      <th>file</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>12438</th>\n",
       "      <td>test</td>\n",
       "      <td>As Jennifer Denuccio used to say on Square Peg...</td>\n",
       "      <td>neg</td>\n",
       "      <td>9946_2.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5705</th>\n",
       "      <td>test</td>\n",
       "      <td>With Knightly and O'Tool as the leads, this fi...</td>\n",
       "      <td>neg</td>\n",
       "      <td>3886_3.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11675</th>\n",
       "      <td>test</td>\n",
       "      <td>Take a bad script, some lousy acting and throw...</td>\n",
       "      <td>neg</td>\n",
       "      <td>9259_1.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9824</th>\n",
       "      <td>test</td>\n",
       "      <td>Strange things happen to Americans Will (Greg ...</td>\n",
       "      <td>neg</td>\n",
       "      <td>7593_3.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22581</th>\n",
       "      <td>test</td>\n",
       "      <td>Sometimes, you're up late at night flipping th...</td>\n",
       "      <td>pos</td>\n",
       "      <td>7824_7.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31555</th>\n",
       "      <td>train</td>\n",
       "      <td>With a cast of stalwart British character acto...</td>\n",
       "      <td>neg</td>\n",
       "      <td>4650_2.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36478</th>\n",
       "      <td>train</td>\n",
       "      <td>There's a lot of movies that have set release ...</td>\n",
       "      <td>neg</td>\n",
       "      <td>9081_1.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35878</th>\n",
       "      <td>train</td>\n",
       "      <td>Welcome to movie 17 on the chilling classics 5...</td>\n",
       "      <td>neg</td>\n",
       "      <td>8541_1.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16040</th>\n",
       "      <td>test</td>\n",
       "      <td>This is a forgotten classic of a film, and Har...</td>\n",
       "      <td>pos</td>\n",
       "      <td>1937_10.txt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12996</th>\n",
       "      <td>test</td>\n",
       "      <td>Spoilers: This movie has it's problems, but in...</td>\n",
       "      <td>pos</td>\n",
       "      <td>10447_7.txt</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10000 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        type                                             review label  \\\n",
       "12438   test  As Jennifer Denuccio used to say on Square Peg...   neg   \n",
       "5705    test  With Knightly and O'Tool as the leads, this fi...   neg   \n",
       "11675   test  Take a bad script, some lousy acting and throw...   neg   \n",
       "9824    test  Strange things happen to Americans Will (Greg ...   neg   \n",
       "22581   test  Sometimes, you're up late at night flipping th...   pos   \n",
       "...      ...                                                ...   ...   \n",
       "31555  train  With a cast of stalwart British character acto...   neg   \n",
       "36478  train  There's a lot of movies that have set release ...   neg   \n",
       "35878  train  Welcome to movie 17 on the chilling classics 5...   neg   \n",
       "16040   test  This is a forgotten classic of a film, and Har...   pos   \n",
       "12996   test  Spoilers: This movie has it's problems, but in...   pos   \n",
       "\n",
       "              file  \n",
       "12438   9946_2.txt  \n",
       "5705    3886_3.txt  \n",
       "11675   9259_1.txt  \n",
       "9824    7593_3.txt  \n",
       "22581   7824_7.txt  \n",
       "...            ...  \n",
       "31555   4650_2.txt  \n",
       "36478   9081_1.txt  \n",
       "35878   8541_1.txt  \n",
       "16040  1937_10.txt  \n",
       "12996  10447_7.txt  \n",
       "\n",
       "[10000 rows x 4 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "imdb_df = pd.read_csv('data/imdb_master.csv', index_col=0, encoding=\"ISO-8859-1\")\n",
    "imdb_df = imdb_df[imdb_df['label'].str.startswith(('pos','neg'))]\n",
    "imdb_df = imdb_df.sample(frac=0.2, random_state=999) # Take a subsample of the dataset for speed\n",
    "imdb_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10000, 4)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "imdb_df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Above, we used some key words in a review to decide how to classify it. \n",
    "- But here, we just have raw text:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"This is an example of why the majority of action films are the same. Generic and boring, there's really nothing worth watching here. A complete waste of the then barely-tapped talents of Ice-T and Ice Cube, who've each proven many times over that they are capable of acting, and acting well. Don't bother with this one, go see New Jack City, Ricochet or watch New York Undercover for Ice-T, or Boyz n the Hood, Higher Learning or Friday for Ice Cube and see the real deal. Ice-T's horribly cliched dialogue alone makes this film grate at the teeth, and I'm still wondering what the heck Bill Paxton was doing in this film? And why the heck does he always play the exact same character? From Aliens onward, every film I've seen with Bill Paxton has him playing the exact same irritating character, and at least in Aliens his character died, which made it somewhat gratifying...<br /><br />Overall, this is second-rate action trash. There are countless better films to see, and if you really want to see this one, watch Judgement Night, which is practically a carbon copy but has better acting and a better script. The only thing that made this at all worth watching was a decent hand on the camera - the cinematography was almost refreshing, which comes close to making up for the horrible film itself - but not quite. 4/10.\""
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "imdb_df.loc[1,\"review\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need a way to transform this raw text into use usual tabular format, where each column is a feature... hmm... 🤔\n",
    "\n",
    "<br><br><br><br>\n",
    "\n",
    "How about this: each word is a feature (column), and we check whether the word is present or absent in the review 💡"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.feature_extraction.text import CountVectorizer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make a `CountVectorizer` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "vec = CountVectorizer(binary=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note `binary=True` means just check whether a word is present (1) or absent (0), instead of counting the number of occurrences of the word."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Call `fit`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CountVectorizer(binary=True)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vec.fit(imdb_df[\"review\"]) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For feature preprocessing objects, called _transformers_, we use `transform` instead of `predict` (indeed, it's not a prediction):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = vec.transform(imdb_df[\"review\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0, 0, 0, ..., 0, 0, 0]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X[1].toarray()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hmm, is it only zeros for this first review?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10000, 52863)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's a lot of columns!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['00',\n",
       " '000',\n",
       " '001',\n",
       " '007',\n",
       " '0093638',\n",
       " '00pm',\n",
       " '00s',\n",
       " '01',\n",
       " '02',\n",
       " '03',\n",
       " '04',\n",
       " '041',\n",
       " '05',\n",
       " '06',\n",
       " '0615',\n",
       " '06th',\n",
       " '07',\n",
       " '08',\n",
       " '089',\n",
       " '0and']"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vec.get_feature_names()[:20]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Ok, this seems like a lot of useless \"words\".\n",
    "- We can use some hyperparameters of the `CountVectorizer` to just take more common words.\n",
    "- Note: later in the course we will explore more options here.\n",
    "  - For now we'll just focus on subsetting the features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "vec = CountVectorizer(min_df=50, binary=True) # words that appear at least n times\n",
    "\n",
    "vec.fit(imdb_df[\"review\"]) \n",
    "\n",
    "X = vec.transform(imdb_df[\"review\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10000, 3243)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also explicitly ask for the top $n$ words:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "vec = CountVectorizer(max_features=1000, binary=True) # max n columns\n",
    "\n",
    "vec.fit(imdb_df[\"review\"]) \n",
    "\n",
    "X = vec.transform(imdb_df[\"review\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note there is shorthand for this in scikit-learn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = vec.fit_transform(imdb_df[\"review\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10000, 1000)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>10</th>\n",
       "      <th>20</th>\n",
       "      <th>30</th>\n",
       "      <th>80</th>\n",
       "      <th>able</th>\n",
       "      <th>about</th>\n",
       "      <th>above</th>\n",
       "      <th>absolutely</th>\n",
       "      <th>across</th>\n",
       "      <th>act</th>\n",
       "      <th>...</th>\n",
       "      <th>wrong</th>\n",
       "      <th>year</th>\n",
       "      <th>years</th>\n",
       "      <th>yes</th>\n",
       "      <th>yet</th>\n",
       "      <th>york</th>\n",
       "      <th>you</th>\n",
       "      <th>young</th>\n",
       "      <th>your</th>\n",
       "      <th>yourself</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9995</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9996</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9997</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9998</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9999</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10000 rows × 1000 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      10  20  30  80  able  about  above  absolutely  across  act  ...  wrong  \\\n",
       "0      0   0   0   0     0      1      0           0       0    1  ...      0   \n",
       "1      0   0   0   0     0      0      0           0       0    0  ...      0   \n",
       "2      0   0   0   0     1      1      0           0       1    0  ...      0   \n",
       "3      1   0   0   0     0      0      0           0       0    0  ...      0   \n",
       "4      1   0   0   1     0      1      0           0       0    0  ...      0   \n",
       "...   ..  ..  ..  ..   ...    ...    ...         ...     ...  ...  ...    ...   \n",
       "9995   0   0   0   0     0      1      0           0       0    0  ...      1   \n",
       "9996   0   0   0   0     0      1      0           1       0    0  ...      0   \n",
       "9997   1   1   1   0     0      0      0           1       0    0  ...      0   \n",
       "9998   0   0   0   0     0      0      0           0       0    0  ...      0   \n",
       "9999   1   0   0   0     0      1      0           0       1    0  ...      0   \n",
       "\n",
       "      year  years  yes  yet  york  you  young  your  yourself  \n",
       "0        1      0    0    0     0    0      0     1         0  \n",
       "1        0      1    0    0     0    0      0     0         0  \n",
       "2        0      0    0    0     0    1      1     0         0  \n",
       "3        0      0    0    0     0    0      1     0         0  \n",
       "4        1      1    0    0     0    1      0     1         0  \n",
       "...    ...    ...  ...  ...   ...  ...    ...   ...       ...  \n",
       "9995     0      0    0    0     0    0      0     0         0  \n",
       "9996     0      1    0    0     0    1      1     1         0  \n",
       "9997     0      0    0    0     0    1      0     0         0  \n",
       "9998     0      0    0    0     0    1      0     0         0  \n",
       "9999     0      0    0    0     0    0      0     0         0  \n",
       "\n",
       "[10000 rows x 1000 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_df = pd.DataFrame(data=X.toarray(), columns=vec.get_feature_names())\n",
    "data_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "12438    neg\n",
       "5705     neg\n",
       "11675    neg\n",
       "9824     neg\n",
       "22581    pos\n",
       "        ... \n",
       "31555    neg\n",
       "36478    neg\n",
       "35878    neg\n",
       "16040    pos\n",
       "12996    pos\n",
       "Name: label, Length: 10000, dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = imdb_df['label']\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q&A\n",
    "\n",
    "(Pause for Q&A)\n",
    "\n",
    "<br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, now we split the data...\n",
    "\n",
    "Note there are two different ways to call `train_test_split`:\n",
    "\n",
    "```python\n",
    "df_train, df_test = train_test_split(df_all)\n",
    "```\n",
    "\n",
    "or\n",
    "\n",
    "```python\n",
    "X_train, X_test, y_train, y_test = train_test_split(X_all, y_all)\n",
    "```\n",
    "\n",
    "Both are fine.\n",
    "\n",
    "Note the order of the outputs though, I find it un-intuitive!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_imdb, X_test_imdb, y_train_imdb, y_test_imdb = train_test_split(X, y, random_state = 123)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, this seems reasonable. Are we all good?\n",
    "\n",
    "<br><br><br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### NO!!\n",
    "\n",
    "- Remember, the test data should be **unseen** data.\n",
    "- Here, the test data influenced how we preprocessed the training data:\n",
    "  - We used it to determine which words were the top words.\n",
    "  - In fact, some of our features might be words that don't even appear in the training set!!\n",
    "  - Thus, **the test error is no longer an accurate measure of how our model generalizes to unsees data**.\n",
    "- This is called The Golden Rule of ML: the test data should not influence the training process in any way.\n",
    "- If we violate the Golden Rule, our test score will be overly optimistic!\n",
    "  - That is, deployment accuracy will probably be worse."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The \"Golden Rule\"\n",
    "\n",
    "- Previously, we used cross-validation to inform our choice of the least overfit model.\n",
    "- Is this okay? It can be, if we don't do this too many times.\n",
    "- If we use it too many times, we suffer from \"optimization bias\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The \"golden rule\" is that we need to be extremely vigilant about what data is being used for what.\n",
    "- A test set should only be used \"once\".  \n",
    "- Even if only used once, it won't be a perfect representation of deployment error:\n",
    "  1. Bad luck (which gets worse if it's a smaller set of data)\n",
    "  2. The deployment data comes from a different distribution\n",
    "  3. And if it's used more than once, then you have another problem, which is that it influenced training and is no longer \"unseen data\".\n",
    "\n",
    "Avoid this 3rd issue! The other two are bad enough."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q&A\n",
    "\n",
    "<br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Back to our dataset. We need to start over..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Just to be safe\n",
    "imdb_df = pd.read_csv('data/imdb_master.csv', index_col=0, encoding=\"ISO-8859-1\")\n",
    "imdb_df = imdb_df[imdb_df['label'].str.startswith(('pos','neg'))]\n",
    "imdb_df = imdb_df.sample(frac=0.2, random_state=999)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll split right away to avoid violating the Golden Rule:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# STEP 1\n",
    "imdb_train, imdb_test = train_test_split(imdb_df, random_state=123)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_imdb_raw = imdb_train['review']\n",
    "y_train_imdb = imdb_train['label']\n",
    "\n",
    "X_test_imdb_raw = imdb_test['review']\n",
    "y_test_imdb = imdb_test['label']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "vec = CountVectorizer(min_df=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_imdb = vec.fit_transform(X_train_imdb_raw)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test_imdb = vec.fit_transform(X_test_imdb_raw)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now surely we're all good, right?\n",
    "\n",
    "<br><br><br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### NOPE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "dt = DecisionTreeClassifier()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "dt.fit(X_train_imdb, y_train_imdb);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "Number of features of the model must match the input. Model n_features is 2573 and input n_features is 1025 ",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-30-7a5c6356b78e>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX_test_imdb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m/opt/miniconda3/envs/cpsc330env/lib/python3.8/site-packages/sklearn/tree/_classes.py\u001b[0m in \u001b[0;36mpredict\u001b[0;34m(self, X, check_input)\u001b[0m\n\u001b[1;32m    425\u001b[0m         \"\"\"\n\u001b[1;32m    426\u001b[0m         \u001b[0mcheck_is_fitted\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 427\u001b[0;31m         \u001b[0mX\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_X_predict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcheck_input\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    428\u001b[0m         \u001b[0mproba\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtree_\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    429\u001b[0m         \u001b[0mn_samples\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/miniconda3/envs/cpsc330env/lib/python3.8/site-packages/sklearn/tree/_classes.py\u001b[0m in \u001b[0;36m_validate_X_predict\u001b[0;34m(self, X, check_input)\u001b[0m\n\u001b[1;32m    394\u001b[0m         \u001b[0mn_features\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    395\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mn_features_\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mn_features\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 396\u001b[0;31m             raise ValueError(\"Number of features of the model must \"\n\u001b[0m\u001b[1;32m    397\u001b[0m                              \u001b[0;34m\"match the input. Model n_features is %s and \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    398\u001b[0m                              \u001b[0;34m\"input n_features is %s \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: Number of features of the model must match the input. Model n_features is 2573 and input n_features is 1025 "
     ]
    }
   ],
   "source": [
    "dt.predict(X_test_imdb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7500, 2573)"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train_imdb.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2500, 1025)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test_imdb.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- What happened here?\n",
    "- We fit the transformer on the training data, and transformed.\n",
    "- Then we fit the transformer on the test data, and transformed.\n",
    "- Now our data set makes no sense - we have a different number of columns in our train/test set!\n",
    "- So, for example, the decision tree will split on a word \"amazing\", which is feature 2000, but then when we predict, we don't even have that feature!\n",
    "- This is not a Golden Rule issue, it's just a preprocessing failure.\n",
    "  - But note how the Golden Rule issue is harder to catch, because there's no Python error message.\n",
    " \n",
    "Another rule: we must follow exactly the same preprocessing steps for train/test (and we'll bring this back to cross-validation next class)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, if we had used `max_features` instead of `min_df` we wouldn't have even gotten a Python error message! See below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "vec = CountVectorizer(max_features=1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_imdb = vec.fit_transform(X_train_imdb_raw)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['10', '20', '80']"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vec.get_feature_names()[:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test_imdb = vec.fit_transform(X_test_imdb_raw);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['10', '20', '30']"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vec.get_feature_names()[:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "dt = DecisionTreeClassifier()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "dt.fit(X_train_imdb, y_train_imdb);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['neg', 'pos', 'neg', ..., 'neg', 'neg', 'neg'], dtype=object)"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt.predict(X_test_imdb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Note: no error message.\n",
    "- The problem is that the coefficients correspond to the **order** of the features, so the results will be completely invalid."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Attempt 3 (I promise it will be OK this time)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Just to be safe\n",
    "imdb_df = pd.read_csv('data/imdb_master.csv', index_col=0, encoding=\"ISO-8859-1\")\n",
    "imdb_df = imdb_df[imdb_df['label'].str.startswith(('pos','neg'))]\n",
    "imdb_df = imdb_df.sample(frac=0.2, random_state=999)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "# STEP 1\n",
    "imdb_train, imdb_test = train_test_split(imdb_df, random_state=123)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_imdb_raw = imdb_train['review']\n",
    "y_train_imdb = imdb_train['label']\n",
    "\n",
    "X_test_imdb_raw = imdb_test['review']\n",
    "y_test_imdb = imdb_test['label']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "vec = CountVectorizer(min_df=50, binary=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_imdb = vec.fit_transform(X_train_imdb_raw)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We do not want this next line!!!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# vec.fit(X_test_imdb_raw);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We transform the test data with the transformer _fit on the training data_!!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test_imdb = vec.transform(X_test_imdb_raw);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, let's give this a try:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "dt = DecisionTreeClassifier()\n",
    "dt.fit(X_train_imdb, y_train_imdb);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt.score(X_train_imdb, y_train_imdb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.686"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dt.score(X_test_imdb, y_test_imdb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q&A\n",
    "\n",
    "(Pause for Q&A)\n",
    "\n",
    "<br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Break (5 min)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Logistic regression: `predict_proba` (10 min)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.dummy import DummyClassifier\n",
    "dc = DummyClassifier(strategy=\"prior\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "dc.fit(X_train_imdb, y_train_imdb);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5024"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dc.score(X_train_imdb, y_train_imdb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "lr = LogisticRegression(max_iter=1000)\n",
    "lr.fit(X_train_imdb, y_train_imdb);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9833333333333333"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.score(X_train_imdb, y_train_imdb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8256"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.score(X_test_imdb, y_test_imdb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cool, we got a better test error this way!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Predicting probabilities\n",
    "\n",
    "- Logistic regression seems to do fairly well on this task.\n",
    "- Furthermore, we have a new and useful method, `predict_proba`.\n",
    "- `predict` returns the class with the highest probability.\n",
    "- Can we find the reviews where our classifier is most confident or least confident?\n",
    "- How about reviews where the classifier is not very confident? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['neg', 'pos'], dtype=object)"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.classes_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[2.77104085e-02, 9.72289591e-01],\n",
       "       [2.45126495e-01, 7.54873505e-01],\n",
       "       [2.97764289e-05, 9.99970224e-01],\n",
       "       ...,\n",
       "       [3.45866415e-02, 9.65413358e-01],\n",
       "       [3.67688371e-02, 9.63231163e-01],\n",
       "       [1.23330879e-02, 9.87666912e-01]])"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "probs = lr.predict_proba(X_test_imdb)\n",
    "probs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2500, 2)"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "probs.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- It gives two columns, the probability of class 0 and the probability of class 1. \n",
    "- We only really care about one of them, since they add to 1. Let's take the second column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.97228959, 0.75487351, 0.99997022, ..., 0.96541336, 0.96323116,\n",
       "       0.98766691])"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.predict_proba(X_test_imdb)[:,1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What's the most positive and most negative review according to our classifier?  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9999999999977582"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.max(lr.predict_proba(X_test_imdb)[:,1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wow! Let's find that review:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1594"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "most_positive_ind = np.argmax(lr.predict_proba(X_test_imdb)[:,1])\n",
    "most_positive_ind"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\"Pitch Black\" was a complete shock to me when I first saw it back in 2000. In the previous years, I was repeatedly disappointed by all the lame sci-fi movies (Ex: STARSHIP TROOPERS) and thought that this movie wouldn't be any different. But to plainly put it: This movie freaked me out... in a good way. I wasn't aware that I was still afraid of the dark till I watched this movie; I must have buried my fear in the back of my subconscious when I was a kid and it rightfully deserves to stay there.<br /><br />The alien creatures sent shivers up my spine; the individual(s) who designed them have a twisted but brilliant and creative imagination to come up with something so impressive and grotesque. <br /><br />I loved how the writers gave each main character a history and showed their flaws and strengths without much confusion.<br /><br />Riddick's (Vin Diesel) gift for escaping out of any impossible situation and putting up a hell of a fight was jaw dropping. At first, you figure him out to be a coldly intelligent villain but in some brief moments, you can see something humane behind his animal side. But as soon you discover it, he does something maliciously devious. He certainly keeps you guessing right up to the very end. I didn't know whether to despise or admire him... he's definitely a love/hate type of character.<br /><br />Johns (Cole Hauser) was a perfect example of a character that puts up a good front but through a need for greed, shows his real intentions and what he's willing to do to survive. John's knack for knowing what buttons to push and the right words to say makes him as devious as Riddick.<br /><br />Fry (Radha Mitchell) is a character who, as Johns so nicely expressed, looked to her thine own ass first before considering the consequences. But what's endearing about her is that she quickly realizes the errors of her ways and tries desperately to pay penance, even while endangering her life when others discarded all human values and went for the dark hills running.<br /><br />Jack (Rhiana Griffith) simply wanted to have a hero and was the first one out of the whole group to look for that hero in Riddick; through a child's eye, good can be seen through the thick clouds of evil. I thought it was absolutely priceless when Jack shaves his head in ode to Riddick; you know what they say: Imitation is the best form of flattery.<br /><br />Imam (Keith David), like Jack, has the ability to see good in any evil. He uses philosophy to carry him through the hardships that he meets and when time permits, he rationally grieves his losses and then soldiers on. In a way, he served as a morale booster for the survivors even though most of the characters acted as though they weren't listening.<br /><br />The casting for this movie was positively perfect. Each actor shined brightly in their role and their talents blended wonderfully on-screen.<br /><br />This movie may have had a small budget but the director's leadership and the actor's performances made the movie work and allowed the audience to use their imagination instead of letting some outrageously expensive Special Effects do all the work for them. This movie is a definite Sci-Fi classic. Watch it and judge (with an open mind) for yourself. It will be well worth it.\n"
     ]
    }
   ],
   "source": [
    "print(X_test_imdb_raw.iloc[most_positive_ind])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most negative review:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.679889028418538e-13"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.min(lr.predict_proba(X_test_imdb)[:,1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "382"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "most_negative_ind = np.argmax(lr.predict_proba(X_test_imdb)[:,0])\n",
    "most_negative_ind"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "I'm a big horror film buff, particularly of the 1980's subgenres. Name one Â? I've probably seen it. Last year, a new little horror movie that seemed to slip under mainstream radar called \"Saw\" was about to hit theaters. I was moderately excited. Having not heard anything about it, I thought it looked quite promising judging by the previews and posters (well, except the back and white ones with the severed hands and feet...those just looked terrible!) I saw the film on opening night. It was one of the worst experiences of my life. This movie was literally mentally and psychically painful to watch. Because it was scary?...NO! Because it was one of the most awful movies I had ever had the displeasure of seeing! First off, the construction of the screenplay and editing was utterly atrocious, even by horror movie standards. Starting off a sequence in an interrogation room with a victim (Shawnee Smith) who recently survived a serial killer's attack, then showing a flashback of what she survived? NOT SCARY! It was impossible to feel any type of tension WHATSOEVER knowing that the aforementioned victim was perfectly alright. Sure, that reverse-bear-trap thing was creepy...but WHY should I feel in the least bit frightened when CLEARLY, you just showed me she survived the ordeal? Unfortunately, the entire film was constructed this way. It starts with two guys in a cellar. Then, they show flashbacks of how they were abducted...NOT SCARY! Why? Because we already know what's gonna happen to them, seeing as how we JUST SAW the result of the attack. THEY'RE FINE! Move on with the story! Even more unfortunately, the actual story was meager at best. I couldn't have cared less for these annoying, pitiful excuses for \"characters\" and the acting didn't help. Cary Elwes was solid for the most part and then suddenly towards the end he started crying like a lost infant while straining to keep his American accent in tact (it didn't work Â? the audience I saw this with was in stitches). This drove him to a rash and idiotic decision even the most simple-minded wouldn't attempt. He had other options. Better ones. SMARTER ONES. Even given his intense emotional state (horribly communicated through horrible acting), it was still irrational. I didn't buy it. BAD WRITING ALERT! Furthermore, even when certain sequences were played straight-through and flashback-free, they were painfully predictable. I constantly found my foot tapping impatiently waiting for the dumb sequence to end. This happened for the entire film. I saw every single \"twist\" coming. Twenty minutes into the film, I had already called the killer's identity, not to mention his connection to his \"accomplice(s)\" as SOON as they appeared on screen. Better acting might've been able to overshadow the awful script. Instead, the actors might as well have had \"RED-HERRING\" or \"ACCOMPLICE\" tattooed across their foreheads.<br /><br />By the end of the movie, I was utterly outraged I had wasted even a fragment of my life on this film, and the entire theatre was laughing hysterically at the downright horrendous finale. Seriously, you'd think they were watching a Monty Python movie. I would've been laughing too, had I not been so angered at the film's total and utter failure to accomplish ANYTHING it set out to do. When we left, there was (no joke) a line to speak to the manager of the theatre to get their money back (didn't happen). I was absolutely positive the movie was going to be a box-office bomb. The following week, you couldn't have imagined my shock to find out \"Saw\" had hit number one at the box office and EVERYONE was talking about it (mostly individuals who found \"Napoleon Dynamite\" to be a thought-provoking epic tale and thought \"satire\" was some type of rubber). I am so utterly sickened to hear people praise this film that I often feel as though I'm going to vomit. It's entertainment for the most feeble and simple-minded of the human race. Those who find some weird Jigsaw clown-puppet riding on a tricycle threatening (it's a doll Â? knock it over and leave Â? what's so frightening about that?).<br /><br />Don't get me wrong, I own every \"Friday the 13th\", love my splatter movies, thought \"Napoleon Dynamite\" was hilarious, can't get enough of Freddy, Michael, Pinhead, or Leatherface, have a font appreciation for unknown horror gems and rank \"Sleepaway Camp II: Unhappy Campers\" amongst my Top 10 Favorite Slashers. However, I realize these films aren't the most sophisticated American cinema has to offer Â? I appreciate them for what they are Â? quick, easy fun. \"Saw\" is cinematic garbage. The film attempts to be a smart and semi-sophisticated, nasty little thrill ride, and bogs down to an irritating, annoying waste of time, money, energy, and celluloid. Atrocious on all accounts. Every single copy should be incinerated, along with its feeble-minded fans. Shame on all of you.<br /><br />Will I see \"Saw II\"? Maybe after I take a double-shot of Liquid Drano before I gouge out my own eyes and impale white-hot shish-kabob brochettes into my ears and colon. My Rating: 0/10. Avoid at all costs.\n"
     ]
    }
   ],
   "source": [
    "print(X_test_imdb_raw.iloc[most_negative_ind])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q&A\n",
    "\n",
    "(Pause for Q&A)\n",
    "\n",
    "<br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Logistic regression: coefficients and interpretation (10 min)\n",
    "\n",
    "- One of the primary advantage of linear classifiers is their ability to interpret models. \n",
    "- What features are most useful for prediction? What words are swaying it positive or negative?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Let's find the most informative words for positive and negative reviews \n",
    "\n",
    "- The information you need is exposed by the `coef_` attribute of [LogisticRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) object. \n",
    "- The vocabulary (mapping from feature indices to actual words) can be obtained as follows: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-0.69748587,  0.68674704,  0.02057012, ..., -0.7324146 ,\n",
       "        -0.550139  , -0.35196627]])"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.coef_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['000', '10', '100', '11', '12', '13', '15', '1950', '1970', '20']"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get features (words in our case)\n",
    "vocab = vec.get_feature_names()\n",
    "vocab[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([-0.69748587,  0.68674704,  0.02057012, -0.15322152,  0.59517478,\n",
       "       -0.27450501, -0.93007   ,  0.09361947, -0.73974962,  0.06040557])"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weights = lr.coef_.ravel()\n",
    "weights[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Weight</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000</th>\n",
       "      <td>-0.697486</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.686747</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>0.020570</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>-0.153222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>0.595175</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>your</th>\n",
       "      <td>0.015690</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>yourself</th>\n",
       "      <td>0.175757</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>youth</th>\n",
       "      <td>-0.732415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zero</th>\n",
       "      <td>-0.550139</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zombie</th>\n",
       "      <td>-0.351966</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2573 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Weight\n",
       "000      -0.697486\n",
       "10        0.686747\n",
       "100       0.020570\n",
       "11       -0.153222\n",
       "12        0.595175\n",
       "...            ...\n",
       "your      0.015690\n",
       "yourself  0.175757\n",
       "youth    -0.732415\n",
       "zero     -0.550139\n",
       "zombie   -0.351966\n",
       "\n",
       "[2573 rows x 1 columns]"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "words_weights_df = pd.DataFrame(data=weights, index=vocab, columns=['Weight'])\n",
    "words_weights_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Weight</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>perfect</th>\n",
       "      <td>1.729790</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>spectacular</th>\n",
       "      <td>1.620309</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>amazing</th>\n",
       "      <td>1.600239</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>loved</th>\n",
       "      <td>1.579306</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>surprisingly</th>\n",
       "      <td>1.562321</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>poorly</th>\n",
       "      <td>-2.015212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mess</th>\n",
       "      <td>-2.067356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>awful</th>\n",
       "      <td>-2.137512</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>laughable</th>\n",
       "      <td>-2.204186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>waste</th>\n",
       "      <td>-2.477324</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2573 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                Weight\n",
       "perfect       1.729790\n",
       "spectacular   1.620309\n",
       "amazing       1.600239\n",
       "loved         1.579306\n",
       "surprisingly  1.562321\n",
       "...                ...\n",
       "poorly       -2.015212\n",
       "mess         -2.067356\n",
       "awful        -2.137512\n",
       "laughable    -2.204186\n",
       "waste        -2.477324\n",
       "\n",
       "[2573 rows x 1 columns]"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "words_weights_df.sort_values(by=\"Weight\", ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- These coefficients make sense!\n",
    "- Let's use this to explore one of the test cases:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9993078189259635"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex = 5\n",
    "lr.predict_proba(X_test_imdb)[ex,1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'pos'"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.predict(X_test_imdb)[ex]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'As a convert into the Church of Jesus Christ of Latter Day Saints, I try to absorb as much as I can of my new religion\\'s history. I was invited to attend a showing of this film with my sons & the other young men & women as well as their families of our ward. <br /><br />On a beautiful spring evening, we drove to Kirtland, Ohio to the church\\'s historical village located there. We were to have had reservations at the Vistor\\'s Center to view this movie. Since my movie viewing was limited to only a few church documentaries, I was intrigued. The only \"full length motion pictures\" of the church\\'s I had seen was \"Legacy\" and \"My Best Two Years\", both which I thought were very well written and preformed.<br /><br />At the beginning, the missionary interpretor passed out tissues stating that several people had been deeply moved to the point of tears by this movie. I thought \"OK...but it takes a lot to move me to tears.\" Imagine my surprise when I found myself sobbing! It truly is a very moving & inspirational testament to the Prophet Joseph Smith.<br /><br />See it & believe in it\\'s powerful message!'"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test_imdb_raw.iloc[ex]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can find which of the vocabulary words are present in this review:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False, False, ..., False, False, False])"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "words_in_ex = X_test_imdb[ex].toarray().ravel().astype(bool)\n",
    "words_in_ex"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many of the words are in this review?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "94"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.sum(words_in_ex)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['and', 'as', 'at', 'beautiful', 'been', 'beginning', 'believe',\n",
       "       'best', 'both', 'br', 'but', 'by', 'can', 'center', 'church',\n",
       "       'day', 'deeply', 'evening', 'families', 'few', 'film', 'found',\n",
       "       'full', 'had', 'have', 'historical', 'history', 'imagine', 'in',\n",
       "       'into', 'is', 'it', 'latter', 'length', 'limited', 'lot', 'me',\n",
       "       'men', 'message', 'motion', 'move', 'moved', 'movie', 'moving',\n",
       "       'much', 'my', 'myself', 'new', 'of', 'ok', 'on', 'only', 'other',\n",
       "       'our', 'out', 'passed', 'people', 'pictures', 'point', 'powerful',\n",
       "       'see', 'seen', 'several', 'showing', 'since', 'smith', 'surprise',\n",
       "       'takes', 'tears', 'that', 'the', 'their', 'there', 'this',\n",
       "       'thought', 'to', 'truly', 'try', 'two', 'very', 'view', 'viewing',\n",
       "       'village', 'was', 'we', 'well', 'were', 'when', 'which', 'with',\n",
       "       'women', 'written', 'years', 'young'], dtype='<U14')"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.array(vocab)[words_in_ex]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Weight</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>and</th>\n",
       "      <td>0.027488</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>as</th>\n",
       "      <td>0.087752</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>at</th>\n",
       "      <td>-0.053590</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>beautiful</th>\n",
       "      <td>0.850001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>been</th>\n",
       "      <td>-0.358442</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>with</th>\n",
       "      <td>0.184486</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>women</th>\n",
       "      <td>-0.393783</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>written</th>\n",
       "      <td>0.135455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>years</th>\n",
       "      <td>0.652555</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>young</th>\n",
       "      <td>0.418097</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>94 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Weight\n",
       "and        0.027488\n",
       "as         0.087752\n",
       "at        -0.053590\n",
       "beautiful  0.850001\n",
       "been      -0.358442\n",
       "...             ...\n",
       "with       0.184486\n",
       "women     -0.393783\n",
       "written    0.135455\n",
       "years      0.652555\n",
       "young      0.418097\n",
       "\n",
       "[94 rows x 1 columns]"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex_df = pd.DataFrame(data=weights[words_in_ex], index=np.array(vocab)[words_in_ex], columns=['Weight'])\n",
    "ex_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Weight    6.459708\n",
       "dtype: float64"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex_df.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The more positive this value is, the closer to 1 the predicted probability would be. \n",
    "- Also, the more negative this value is, the closer to 0 the predicted probability would be. \n",
    "  - If this value were exactly 0, the predicted probability would be exactly 0.5."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Why people use logistic regression?  \n",
    "\n",
    "- Logistic regression is extremely popular!\n",
    "- Fast training and testing.\n",
    "  - Training on huge datasets.\n",
    "- Interpretability\n",
    "  - Weights are how much a given feature changes the prediction and in what direction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q&A\n",
    "\n",
    "(Pause for Q&A)\n",
    "\n",
    "<br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Logistic regression with continuous features (5 min)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "cilantro_df = pd.read_csv('data/330-students-cilantro.csv')\n",
    "cilantro_df.columns = [\"meat\", \"grade\", \"cilantro\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [],
   "source": [
    "cilantro_train, cilantro_test = train_test_split(cilantro_df, random_state=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>meat</th>\n",
       "      <th>grade</th>\n",
       "      <th>cilantro</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>100.0</td>\n",
       "      <td>90</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>100.0</td>\n",
       "      <td>90</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>85.0</td>\n",
       "      <td>85</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>100.0</td>\n",
       "      <td>80</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>100.0</td>\n",
       "      <td>80</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      meat  grade cilantro\n",
       "98   100.0     90       No\n",
       "123  100.0     90      Yes\n",
       "119   85.0     85      Yes\n",
       "53   100.0     80      Yes\n",
       "33   100.0     80      Yes"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cilantro_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [],
   "source": [
    "cilantro_X_train = cilantro_train[[\"meat\", \"grade\"]]\n",
    "cilantro_y_train = cilantro_train[\"cilantro\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr = LogisticRegression()\n",
    "lr.fit(cilantro_X_train, cilantro_y_train);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAADrCAYAAABXYUzjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAdFUlEQVR4nO3df3RU9Z3/8ef8yA8KVAWlnMVq/ZGuAQmsWqwWvu6q3y4VRARb2uqq3Z7u8bhnv+r31LNttbJYrWe/9Oi23e5S22WlYNdFpASqVtoVRdHKFpeEoAIKgqRGIASSkMlkftzvH5NJ7kzuTOZnPvdOXo9zOMAkM/O5dy6vfLifz/vz8VmWhYiIjDy/6QaIiIxWCmAREUMUwCIihiiARUQMUQCLiBiiABYRMSSYzzefMabWmjJ+HACxuMW+9uOcD+wH6iZOIOD3laGJIjLaeDVfIrE4+453cCHwLlA34QyqAn52H20/ZlnWWenfn1cATxk/jnWLbwDgJ9t3MOvESVbFYtwaCHD6uZ/kzlmXluQgRGR082q+fG3Dr/kK8CTwVeD96ipWLpxP/YqVB52+v6BbEJ3hML9s3s13YzEAHojFeLJ5N53hcKHtFhEBvJsvrZ1dNLUdYVn/3x8EdrYdobWzK+NzCgrg1U0tzLcsLuz/+4XAPMtiTVNLIS8nIjLAq/ly/4svcyOktHsh8N0XX874nLwDOP2nU5JXfkqJiHt5NV/Se79JyV4wUO30vLwDeHVTC1fE4wSAA7ZfAeCz8bjrf0qJiHt5NV/uf/Fl/hc4tns24IPznJ6X1yAcwPvHO2iuruKqDF/3H+/I9yVFRADv5kvbyS7eAmZm+LoPahwfz2c1tIsnnWklZ0GIiEhu6les3GFZ1mXpj6sQQ0TEEAWwSAXqDIf5m03Pu3bQKhdeO4a4ZbFp73sseGozM1f8Owue2symve8Rz3KXQQEsUoFWN7WwvfVD1w5a5cJLxxC3LP7PC6+xdGsHB0+MxcLi/RPjWbr1OHe98HrG5ymARSpMcirXBnD11K1svHYMz+7bz2uHawhF1xJgD41AkLcJRZ9m2+EqgAlOz1MAi1SYZCHDXLxRwODEa8ewqvkQoegDVPN1FgJzgRuAKr5BKLoU8H/C6XkKYJEK4tUyXjsvHkPbqW5gIgG28WD/Y98DgrxCf+e3NIUYMvK8NhjhpBKOwQu8WsZr58VjmDx2HNXcwUJSS5ETveC/Behzep4C2AO8NBiRSSUcg9t5tYzXzqvHcH3dJALsGej9JiV6wW8D8Xan5ymAXc5rgxFOKuEYvMCrZbx2Xj2Gl/YfYA5ZS5HPcHpe3qXIMrKcBiO8sC6qXSUcgxd4tYzXzqvH0NY5WIpsn/XrG/xdpche0xkO84XV/8kb0ejACvuXB4M8/1dL+HiN4+fpOpVwDF7RGQ7zzc0v8oPPX61z6zIqRfYgLw5GpKuEY/AK3Wf3noICuLWzizn/vibrSu/ZaER8ePbBiBPAF4ATuH8wwm7PsXZ+9mYTdw1zDG6+HnJtm+ljsN9nf+J/mpnx0/VMX7GKmY9vYNnW14jG40balatkGe9N67Ywe9Umblq3hbW797j2ukjXGQ7z9cbnmPVvq6lf8SQNK1ZSv+JJLl+5hu4+xwkQQIEBfP+LL9MZ7su60ns2+kk9PPtgxDJgC4nFnd0+GGF31/O/xQLWkP0Y3Hw95No208dgv88+1wLLmoyfONH41Tz1VoCrVz/r2hC2l/HuPrac9tA2dh9bziOv7uWN1g9Z7cLrIt2/vdnEHz5soydyGjWE8QHVhOnsu5DPrGyEwdvBKfIO4OTK740Mv9+RE42I5yY5GDGnppqfAhuAnwJzaqrZVV3FAZcORiS1dnZx5FQPjcC/QsZjcPP1kGvbTB9D+tStb2ER5B0asajieeB3HA1N4pFtvx/RduVqsIz3DWAxUAdcA9YpGoHVTbtcdV2k6wyHWd3UQiMQ4CR+6P8zwC3AJwEanJ6bdwAn9z2ay/D7HTnxWomhKcvnXsvm22/mxmn1fDEQYC6wOBBg0bR6Nt9+M8vnXmu6iVnZr5M/JTEh3ekY3Hw95No208eQfp99EwyUw15PnAD/Aixj/Z5jI9quXCXLeKF24LEgP2AB/T36mLuui3Srm1q4nsT5PpPIwLlfCFTzHfr/3+c44yyvAI7E4nnv+mnnxRJDk7x6vuz7Y50APiAxIR1Sj8HNx5dr20wfQ/r7nwB+DAMFAd8nRBWPAucSjma+F2lSoozX3kE8QZB/4mFCADyE5ZrrIl3y/D8MHAQ6GDz3ydttsJ3UyWmD8grg1s6uvHf9tDM9Im56oCRfps9Xoey7w/4QWACOx+Dm48u1baaPIb1wYRkwj9TznegF/yM1QcflCIybPHYc0Dzw90TvN+7K6yKd/fP/a3DMx2r+HxluAecXwD3RaMZdP4frBbuhxND0QEk+3HC+CpHe+/0xcF/a9ySPYU1TiyuPL9dz74bPaKBwobaGS/1+fgp8N+17Er3gZ5h/oWMxlnG3NZzDmOAyoJf03m+SG66LdPbP/yDwOjjmY6IXnHaR9MsrgMeTudRuuF6w6RJD0wMl+TJ9vgpl3x12GfBZnK+Zz8RiTI7FXHl8uZ57N3xGybGCzbffzFdmXMxsMv0btfjEmDFlb08h5tWdz5Vn9zEmOIsgX2c2EVdeF+nsn/9XYbhdkQNOr5FXKXI3mXf9nHAyew/YdImh18phTZ+vQtl3h03e9Zph+7oPOK22hlN9EfD5uKq6yvF1TB5frufebZ/RoY4TvFNbw2ciUcKxwSlnNQE/H6sKUttxYkTbkyu/z8eP/vJKnnt3P8u3/obfRyI04CNRvTsGiAI9VOGu697++Xf0Jqaezcjwvc43IEZJKbLXy2FVYmqOzv3I2rT3PZZu7eifklZr+0ovY4KzePCqCcyvu8BU8wo2qkuRTQ+UFMtL964rjc79yHKakpZQSyi6lCeaD5loVtkUFMBemk2QaaDk7liMn7/ZVHA59UjpDIdZ09RCA4kwsJ9zt38Oyfat3b0nr51ic5UsX71x7e+45PHV3Lj2dyV97bW79/DzN5sYC/z8zSbW7t5TktceCaaujWLf94PO41SxhMSsiINUMYHEBC+A6bR1d5f1/QvVGQ7zjU3Pc9+WV5i64ikaVqxk6oqnhi0DLyiAvdQryDRQshqIkX8hyUhb3dTCefE4zcB5aYMQbv8cfrFzF2+0fsgjr+5z3Cm2mDCzl6++e/wyYvEo+47PKulrP/LqPiaQGPs4HXjk1b1Fv/ZIMXVtFPu+8Ug3PmJUM59qbsZHB1X8Vf9Xd/VPWSvf+xdq1c5dbG/9kMY9HVTTgw+oomegDDyTvAPYa7MJ3j/eQVNVFTOAq4CrgTkkSmIbgV0fHXXtMSR7vwctiw3AIcsa6AW7/XOwl2fGrS7HnWKfe3d/wa8/WL76AkFeGCi7DUU3l+S1t30QJG510kHiOjkBxK1uXv3AX9RrjwRT10ax77vnWDsxK0Yj4OMDAmzrv2ZeAfYAS5nxicz34U0e9y927qIR8A8pRb6Lo6FJAOc4PTfvADZddpmv5XOvZeG0i7gxEOAQid7vN2CgTPZGv9+1x5Ds/S4g0db5DPaC3f45rG5q4bq41V+e2eu4U2wx9/OS9wqD/DMLiKeU3ZbitaOxP+VMwill9xPpJRKb6vr7kKaujWLf967nfztwnZwJadfMZcCZ7Pyot2zvX6jVTS1c1/++44aUIv8NiQmZgYlOz80rgGNxy7Wlo5lkKtVMTph26zHYe7/JQob7GewF24sY3HYMyXP+Paz+8kzLcafY4e7nZZMoX/1UyqR9e9ltMa/9x+5O/DxLB1ZK2f0JLAJs4o9dnQW/drmZKo0u9n33HGvnyKkeHiRxx/c4pF0z3cB3+ejUqbK8f6HspcjNJEpJhpYi7wMsx6zNK4DbQyHPzSbIpVTTjcewuqmFybEY80lt63XApFiM8+PuLdUcrjwzuVPs5HHZ7+dlM3nsOIJ8f0jJarLstpjXrvVFOb2/95teVnoaYWr80YJfu9xMzfgp9n2Tvd/s18zfZfxc3XDci2HIrsgLgWq+DfgcR+LyCuDjPSFXlo5mYy/VTC7tmF6q6cZj2He0nYM4tJXE4jaT0gaC3HIMuZRnJneKXVA3qeD3+VL9n1DFr4aUrH6fENWsZ0n9nxT82hNroAvnstJu4EyXTgc2VRpd7Pu+ffRYSu8332vGDcfdDLTCkF2RbaXIjv8lyyuAx+Fcaue2EkE7e6nmjdPqudrv98Qx1J01MWNbZwP1Do+74RjyKc/csv9Awe9zpLuLOVgZy26PFHELYlxNTdYdbse5tCDDVGl0se97x6bfDJzvQq4ZNxz3DcO02wcfd3qNvEqRe3w+rqpxXlHJTSWCmbitdDSbZFtnRaJYsRjjSfTKoj4flmWxw+9nnUMZr+ljsJ/jk71hLIaWZybLMocrX8/mYMcJ3q6t4YpYnN5ojLhl4ff5qA0GqA74qSniPAy3w20x7S4nU9d3se8b6uvjdRLXiQWOJb3Zzr0bjrujN0w7KkWuKF4voRbJxexVm2gPbSOxE0bSCWo5m19xikXAS1+72dXX/Ka97/HAy8cgdoBd9Az8e53OWAicS2/srQOWZZ2f/rxRUYrsVV4voRbJRfp6wDC4JvBcYAE+11/zyamLC7CGDApHYlMB/yecnldQD7i1s4svr9vAeWeczj9f93lX/2SCRGXTs/v2s6r5EH/s7sSKneKez17KTVM/jd+X6T8HZqX3fpO82gtu7eziy8808tTiG5jy8fElec3k5/qzN9/j0IlWzjl9Ct+45ALm1Z1fss+1HO0uNfv13Xaqm9pgFd3hTkJ93Zw19kzuunxaSc9JqSUW4DlOKLqdxBoQid7vNk5xH/Aw8L9dfs1f+UQjod5T7KKXfSRmP2wg0aefzhh6CUctKzbknmHhuyL3RWj+6KjrfzKl77ja1fsleiJh15eVumGt2VIqdiftdPbPtdRlznalbnepOe0o3Nr1I3r7EoOQR0+1l/yclJp9PWB4hiDfYQERNpHYSfvXuP9/fsF4mNn0EgCW9D+2hMS/188RAuKOJ7+oXZH9JGqgTU99yiZ1x9VrCPILGrHA6nF1Wal9+lz6Ly/simxX7E7aTgY/17UlL3MuZ7tLzXlH4bitHNYiFP1Wyc5JOSTXA37wqglcNPH/UsW/8vf08WMSvcgfk1g8yw3TLDPp7uvm9wSop4ooiXMfBeqp4g0C+MBx4eu8b0GMDwQ4r+0IvyQxZWQrsOiSGa5d3PymdVvYfWw5sJgg97OIR/lPQnyJMaxnHhedtYN1i//CdDMr2tc2/Jrz247wJIlr5v3Jk1i5cH5Rr5n8XKt5jMVs45fAV4BnmEOEu5h21r1Ff67laHep2a/vpBoCLCI+8G/0Gfz0sbYk56TcfrJ9Bwd27qIuHuc4sAq4lURp8h6/nwtmTndl1tSvWA/8nBpuYhHYzj30sQ5YgmVFh9wDKmhXZHup3XHc3Qse3HH1hEPZ6rOuLiutBPY94iD/nbQzSXyuEwmwreRlzlC+dpfa0B2Fn8ZPPK0cNg60Fn1ORsL7xzvYWRVMKZh6gMTiWU1VQRf/zy8GPIUfp1Lk/6BkuyIvZGip3XgX35NMjrA67bR6PTFXl5VWAvsOyZD/TtqZTB47jmruGHI9lqLMGcrX7lJLn0FQw5czlMPeXfQ5KaVM6/Yun3stN06r54uBQMoxLA4EWDStnuVzrx3xtuYmSA3rMpz7ZyjZrshOpXZu7gXf1nAOtYEHCPKYQ9lqH6FwlyvbXQnSe5FJpehNXl83iQB7hlyPpShzLme7Sy11R+HU3m9SoidmMXNS+i4T5mRat9cNO00XpiOl95tkK0UufhAu267IY1zaC55Xdz5nj2tjDiHHtn/OwpXtrgT2HZKdrpliepMv7T+QtVy4mDLncra71OwzCGpYkvWc/Gr32wZbOijbur0/2b6DqmiUU+Q3+ye5O8pN67Ywe9Umblq3pWS7o+SiFrKee1+GLnDeuyJnK7Vz4/0Zv8/HRWeM4797uvgzh5JVMF++W6nsOyQ7Kaakt5zlwuVsd6nZdxRe9l/WQEmv4/eOZMOyyLZD+eZ399MBzPH5ON1h2QOnf6vJqXiJ2SDLgQbaQ80s3bqMzfs/4od/eUXZ50AHIeu5VymyiBiXrby+K9zH9b98mvXAImDTV7+YU/GLG3ZSHq5w6ng0utOyrD9Lf55bfiiKyCiQrbw+OfCZ3E0i11s9bthJebjCKcCxFDmvWxAiIoVK3vt9w2GAbVZzC6FojNX9jz0INPQPeA7XCx46Fc9u+J2US2G4Fdl8veExTo8rgMWz0tdAmDx2HLc1nFPSdQ86w2G+uflFfvD5qx3XIRiJNuTDbe2xS+8lJgWAy6IxjuI87W+44pfJY8fRHmomdTW1pF0jMv3OPj3Oaf2Q+hUr33N6nm5BiCc5rYGw+9jykq97kG2b85FqQ67c1p50mcrr59RU8zrwybTvz3XaX+pUPLtexgSXcXuD44bEZZPP+iHqAYsnpa6BkLz3V0coOo9th2fx3Lv7ix54sU+Xurl5N7fMuDilFzwSbciH29qTLlMRxdc2/JqL247wQ4b2jJPT/rL1gufVnc8L+9t47fAsQtGlwHRgF2OCy/jc2RGuu3DIMrxlY18/ZFEOt1DUAxZPGomBl+G2OXfD4I+b25OrtpNdvE5i2l/6rzeAD4eZ9mdfzGfaWfcyccxspp11Lw9eNWFEpqDZ5TuQqB6weFK5B17SB4weiMW4PK0X7IbBHzu3tSdXz9/2laJfw+/zMb/uAqM9/GTv12kgMRP1gMWTnHZRGFT8wEsuu5GUuw35clt7RptC1g9RABch04IiUn7lHHjJdT0Ctw3+uK09o8lw64cAjrsZK4CLkG2EXMorfRcF2As8w5jgrKIHXnLdjaScbSiE29ozmgy3fogPznN6nu4BF2i4EXIpL/saCE8030tbdzeTx43j9oZzuO7C4ua85rrNeTnbUAi3tWc0sa8fYpFY+yH5O4nfHcNBa0EU6Cfbd9DZ1MKqWIxbAwFOn3GxK1fqF5GRk8iFXayKxbnV7+fjn/4Ud0ytY/r6F3ZYlnVZ+verB1yAXEbIRWR0iB5pBaCzL8KTTS1sj8UBeCAe59L9bbBkA6x/wfG5o+oecKkGzXIZIZdUXhuwNL2+rBRmJK6z6JHWlF8ALXdv5LEzPsNc/JwJfIHEPnbXRSPs3PqzjK81qnrA9kGzQm8XZFtQRL3gzEpx7keKG9aXlcKU8jqLHW0d+HP6z921856h52j7wN9Dv32H59b8gCYrxg+BLcCPgO/FI8x46V8guTFGmlETwKUaNMu2oEhyhNztATPSvDZg6faSXnFW7HUWO9o6JGin3HMnAC9Yc/nDa22DX7CFL8CL677JlVacUySCdwNwC4k9Cq+w4vwXfMrpPUdNAGdbhT8fuY6Qy6BSnfuRkijpXU7mkt57FcAulO91Fmtvw0r7n2zL3Rt5a18EINHD3Zb8ShvZfHhgB4cJciVRrifRhmuAK/ARIIiPiOOCEKMigEs5aObeXVndyYsDll4t6R3Nsl1nY7s7IJ54PL2Haz30OKs22h54LXvQZhKy/PQxlVr28RA9ADwMbORjdHMBsMvxeaMigLMNmrm5J1YJvHju3bC+rOTH8TqLx/nF669zx9Q6rIceH/jelMC1/7kI8XgVQS5gAXtT2nA9cdbzaWLsLn5TTi/SoJk5Xj33tzWcw9KtywhF5zF0jzGV9LpJrL3NNv0r7TqLx7n0wEewZAO1JQraTHy+OAGe52FCKY9/nxCbeJbYkL53QsVPQ8u1rFRKz6vnXiW97pU+BWzKn89gY+8prvQ5lwFfEY3Q9ErmaWCl8rHqILOJOLbhc0QBK+70vIrvAWvQzByvnnuV9LqH08yElrsT3dm39kXoOdrO5g920FZ9Gpc7LncDk9v2lLmVcMZpE/h990c0MJHUTegt4CQ+rIjT81SKLCLG5TPn1o2seJzGNXdzaN9hIpFvk9yVo6rqEc6t+yTvvrVBpcgi4g7Z5twenjgzdaDM5eEL4PP7ueGWf+Kd5o3seOUxuk62Mv60KVw651YualjAo9/Z4Pg8BbCIlF1+c269yef3Uz9zIfUzF+b8HAWwiJRMrL0t45zbKffcycPbZg4+UOCc20qiABaRgiUXo7Gbcs+dHJ44E0ibc+vxHm45KIBFJKtY+2BPNf02wpR77uSxvdcM/L0SbiWMJAWwiAyR3rM9+5pLBv78UNcdg1/YBuD+QTK3UgCLiGPg/mb6PwCD822l9BTAIqNIzvNtu9Ag2QhQAItUsGLWuJXyUwCLVJBSrnEr5acAFvGgbPNtS7XGrZSfAljEI9IHysq9xq2UnwJYxGWS827TbyWcfc0lPFp7H9B/K0FB63kKYBHDcp5z2wV0aaCskiiARUaYU/lu+hq3MjoogEXKJK81bjVQNiopgEVKRHNuJV8KYJECac6tFEsBLDIMrXEr5aIAFnGQPlCmNW6lHBTAMqrlPOdWQStloACWUUVzbsVNFMBS0ZxmJiTn3AI8oXu2YpACWCqC5tyKFymAxZOyzbk9PHFm6kCZ5tyKSymAxROc5txaDz3O069PBDRQJt6kABZXyWvO7UbQhpDiZQpgMU5zbmW0UgDLiNGcW5FUCmApG825FclOASwlozm3IvlRAEvess25bbl7Y+qyiwpdkYwUwDKsvObcKnBFcqYAFkdOO/Bqzq1IaSmARzHNuRUxSwE8ymjOrYh7KIArlObcirifArhCpA+U2efcPlp73+BqYJpzK+IaCmCPymvOrQJXxJUUwC6nObcilUsB7DKacysyeiiAXUBzbkVGJwXwCEreTtCcWxEBBXBZac6tiGSjAC6BTHNuIbEhJOhWgogMpQAugNNAWXLebcqcW9CGkCKSkQI4B8PNuf3Da22JAgfQnFsRyZkC2EZzbkVkJI3qAHba6jw55xZInZmgwBWREht1Aaw5tyLiFhUbwJnm3J59zSWpG0Jqzq2IGFIxAey0A+8Hf5EI2pQ5t12IiLiC5wI4OeeWeGxI7zY55xb6bydsRETEtVwfwJpzKyKVynUBrDm3IjJaGAlg+3xbGDpQZj30uJZdFJGKNyIBnG2N26Shq4GJiFS2sgWw5tuKiGRXdABrjVsRkcLkHcBa41ZEpDTyCmArEgG0xq2ISCnkFcDtY8/liVmPa76tiEgJ+E03QERktFIAi4gYogAWETFEASwiYogCWETEEAWwiIghCmAREUMUwCIihiiARUQMUQCLiBiiABYRMUQBLCJiiAJYRMQQBbCIiCEKYBERQxTAIiKGKIBFRAxRAIuIGKIAFhExRAEsImKIAlhExBAFsIiIIQpgERFDFMAiIoYogEVEDFEAi4gYogAWETFEASwiYogCWETEEAWwiIghCmAREUMUwCIihiiARUQMUQCLiBiiABYRMUQBLCJiiAJYRMQQBbCIiCEKYBERQxTAIiKGKIBFRAxRAIuIGKIAFhExRAEsImKIAlhExBAFsIiIIQpgERFDFMAiIoYogEVEDFEAi4gYogAWETFEASwiYogCWETEEAWwiIghCmAREUMUwCIihiiARUQMUQCLiBiiABYRMUQBLCJiiAJYRMQQBbCIiCEKYBERQxTAIiKGKIBFRAxRAIuIGKIAFhExRAEsImKIAlhExBAFsIiIIQpgERFDFMAiIoYogEVEDFEAi4gYEiz0iVY8zjtNjex49Wm6TrYy/rQpXDr7i1w04wZ8fuW6iMhwCgpgKx6ncc3dHHq3lUjft4AGerqb+e36R9i76yUW3PKYQlhEZBgFpeQ7TY394fsKsBioAxYTibzKwX0f8E7zxpI2UkSkEhUUwDtefbq/51ub9pVaIpFvs+OVtcW3TESkwhUUwF0nW4GGDF+d3v91ERHJpqAAHn/aFKA5w1d39X9dRESy8VmWlfs3+3xHgYPABKg9F6b6wWf7Dgt4Kw69B4HjpW2qiIhnnWtZ1lnpD+YVwCIiUjqaKyYiYogCWETEEAWwiIghCmAREUMUwCIihiiARUQMUQCLiBiiABYRMUQBLCJiyP8HKs8MjI21gqkAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_classifier(cilantro_X_train, cilantro_y_train, lr);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-0.01768175,  0.05546546]])"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.coef_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- There are still some coefficients/weights being multiplied by the features.\n",
    "- A linear classifier \"slices the space in half\" with a \"hyperplane\" (with 2 features, this is just a line)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at the predicted probabilties:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEHCAYAAAC+1b08AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAABziElEQVR4nO2dd5xU1fn/388sHUUUrNgLRLDEDgoqVmyxRs3PWBL9xkqSr98UE9HEFlOMSTSAIip2URGwi4pKk95BKdJ73V1g+875/XGm3D73ztzZnZ29H173xcwpzzn33tnPPPO5z3OOKKWIECFChAhNH7HGnkCECBEiRAgHEaFHiBAhQpEgIvQIESJEKBJEhB4hQoQIRYKI0CNEiBChSBAReoQIESIUCSJCjxAhQoQiQUToESJEiFAkiAg9QoQIEYoELRp7AhEiRIhQzBCRQ7Ltq5RaGWisKPU/QoQIEfIHEakHJJu+SqlAKkrkoUeIECFCfjEMO6EfBpwJlAEzgY3AvsCJwB7AeGBZ0IEiDz1ChAgRGhAi0g2YBDwPPKSU2mWoaw88APwC6KWUWhTIdkToESJEiNBwEJGRwF5KqbM82nwNbFdKXRHEdhTlEiFChAgNizOBiRnaTAD6BDUcEXqECBEiNCxaAwdkaLN/ol0gRIQeIUKECA2LmcC1InKyU6WInAhcB8wKajjS0CNEiBChASEi5wEfAwp4HfgS2ADsB5wN3IB2ti9SSn0WyHZE6BEiRGgOEJEzgd8AJ6Elj58ppYZl6HMs8F/gVGAb8CzwiMqROEXkmoStjphDGhVQCtyhlHo7qN0oDj1ChAjNBbsB84GXE4cnRKQD8BkwDjgF6IaOKd8F/DOXiSil3hGRT4ArgRPQsedlaJllpFJqZzZ2Iw89QoQIzQ4ishO4x8tDF5E7gb8B+yqlKhNlA4A7gQNz9dLzgUAeesfWrdUB7doDsLmqkq1VVXRq04a927R17iBZZbuGgyyGzvtsAw3gs3GYzcJr5N47Hxc5C6OePcK4Tzmep/MppQs3lu9g084K9tmtPft22C2UsSWbm2Po8v3GreyqqaV9q5YcsW9n5qxev0UptXd2s4HTOu6rymprfLVdVFG6AKgyFA1RSg3JduwEegHjk2SewKfAI8ChwPJsjAZZ2yXoWi6BCP2Adu15+dwL2VFTw9UfjeYj4IbaOgae35fdW7UiFsvxjz2H/tn2DdovaPtYSZDG/oKO/Mwh5uOPU/zMTbznlGkumT4TnuY9+opHx1iJcz/XS+Jy3d0Izu2cncpdSdJnWyeb5VXVnPH3Z/kI+GlNDcN/dg0d27ez2HeYn6P9mGcbx3O1lcVYvbWUMx55mo+Aq2pqGfbzazjhoacDkZEV23fs4KndjvLV9iJKq5RSjlEjOWA/YI2lbKOhLitCR6f0+yWS/K/l8uaS77gM6AdcohRvLV3E/xxzXDamGpzEi5XAISQSz0DgmebTlAlc9/FHqp7lTgPnQODWdkMnTOcylfj7iytemDSTe8/vbfvTz4bAHefgQODm9vCrV0dyJXpOVwC/emWUzW5QKBSqvjZnOzlPwwxxKQ+CYdgJXdBfEsej13SZCCwNajgwoe+qq+XtJYuYWl8PwIPxek5bvIjru/6A3Vu1yti/0Ak8aJ/mTuDgTeIFQeAQihfeWASu2+n5l1VW8eKkmUypqwPggbo6ek6czq1nnkzHdnbpM18EbsSa7WVMX76GFxLvHwaOW7YaIDMheEBKYrTc00XOtWJX5iZZIBlKaMQ+if83kiWUUre61YlIS+BB4G7AtZ0bgiUWifDG4m+5DDgyUXQk2kt/c/F3zl1iYjoCDZdl36B9go4TK0kfmRvH0keOc46JmA5HOyXmw2Ww9JFhLk7zicXEdAQyH5P0Ye0nsdRh6lIipsPcJ31YJul47UXEdGQ6Z8cyFxtO55ZpLO92sdSh7cPQCdO4TCnz319c8fz46a79rLYdz9U295jhsN9TY//+L73LlZg54QpA9IqCWUMEpIX4OvKEb4A+ItLGUHY+sA5YkY8BlVK1SqkHgEXAo0H7ByL0+nictxYv4oGEd57Eg/F63lq8iB01+gFGYxK4n365EHhGEnchkmzm4IfAISCBO7BsmARuM28kCktfI4F7kbi5jwuJ54HAk+W+CNyjrV+bRiI2knjyKKus4oUJMxiQ8M6TeKCujhcmTKe8ukb3tdjOSOAxMQ9EzHZPrTaS92zNtjKmL1vDQ5bb/rD+z/K0NiAEYi3E1+HLnMhuIvJDEflh4kQPTrw/OFH/uIh8YejyOlABDBORY0TkKuA+4MkGiHCZhE4yCoRAksu26ip6KUUJ5qcBJUBPFefNJYv4xbHBtPRIRrGYKwAZpUno4BBIRik0HVy3c5i/hw4+dPw0esXjjn9/p8XjDB03jd/0O9N5DgFlFGt/271LvL1n2LucmZiDdU65sTmQ8NBDxMnorMwkHkocLwG3oNdPOSJZqZQqE5HzgYHAdGA7Ov78yTAn5YK2QOegnQIRek19PXNbtORMl3pVVprRRpOPRvFJ4H7n0FyjUdwIXPdxG6eBo1ECkH3WJO4jGiXZZ9nGrcxr08rRbROg7YbN5nnkicSNdeu2bGcz8EOHOeUsawtIy/AIXSn1lbbqWn+LQ9k8cKW8fGII8GbQToESi7rv1Um9dkG/YANEBG42FxG4Qx+3cZpWOGFYBO7UL9twQnMf6xj+CdxWb52zzZbQ8bb7Z+QSStitbTs15PBuvtqevXB2TmM1BkSkF/Bz7JmizyulJmdjMy+p/5GMYjFXADIKFF80SjHIKG798hGNkhOB6wLXsd1+VeQCicVouZvPKJcmBhH5K/A77OGPJwC3ishflVJ/DGo3FEKPCNxirgl44Q1J4LqfW6fCDCcEKK+u5u43P2Dg9ZeyR9s2jm39ELjffhm9cMP7sooqbh82gmdv+TF7tGtj6GOx6UXiQQjcMr5xrvF4nHemzuXJj2awbO0yDu9yOPdekruzLJL9L/xChohcC/wW/djhUeArdIjk/kBfYABwn4jMVkq9FcR2VoTeXAjc7xyaPIFD8cooOejgz0+czqRlq3TiznlnGNplr4N7t8nkgacHG/LlFCYsXsFzX0/mt5ec7WrDjw5uKPAc3xpSCZrMbxw8ii8X1lNX3QJQLF3bil+99B0QO4KcIEhJ7p5+AeIeYDNwmlJqi6F8GbBMRN5DLyJ2DxCI0INdrYDfmEFDEHMJJ8zcOHMoYZA5NEQ4oXU+9lMKKR7cgQTyHU6o++QWTuhmI1M8uJ9wwvKqal6cNJNRwIuTZqRDAh3CCd3tu8eDO83Bee72cMLyqiqGfj2FUcDQLydTXlWdtmO9d+bu9vtqvGmWz4Q9/NFwPok+70ydy5cL66mofpMSFjEaaMG3VFQPB1p1IBcIthwEt6OJ4XjgHQuZp6CU2gy8AwROvw/16y8oIVv7ZEJjxYODmcCzTuoJSOCZSDyQ+RAI3PiH40rgepIZCdwzycWr3Ckm3OHcco0Hf37iDFN6/fMTp3uSeLKf6fAzB9vc7QRujQcf8uU3XKZUaumNIWMn+yJwkZj9xvkkcCkpMfVLthk4Zi4V1QNoxa1cgb5elwMt+R/ggNz4RSlUXY2vo4mhJZmDgHaSRaZtThc8VwLPKCE0YQIvr67mF+9/QnlNbWgEbiXxjN8PLgReXlXNbSM+ZEd1rW8CB58EHjCpJ9N18Evg1raeY2Ug4rLKKl6YmE7gSSXuVFW7E57frEzHuZu/KbyyMssqqhj65RQG1CbmVlvH0LHfUFZVGZjAbSRuPR8HAk+31cfabaVAJ0qYmEwm4hGgBePJ9RGdlAgtd2vr62hi+B64SFy0zET5RWSxlktgQm/qMoqnuRBllFfmLGDq2vW8Ome+61waS0Z5ecY8Jq9ay8sz5uhuBZCVmclGGDKKlxyix9CHW3r90HHTPAk8FxnFT1amSIxnv/iGSy1zu1gphnwxmbBkFEcSN86nRUnq3h/YaU9acQdXYE79vxwQVpEb7F8ybkcTw2tAD+BdETE9ZxCRo4BRwDHoTNVACHQl/FB4LgTe1LzwxGA2li2vrub1uQsYBbw2Zz47amtCIfAwZJTyqmpemTWXUcArM+ey07DedEMSuBfZup1bWATuqIPHdL/yqmrv9HqLl+44B9vcsydwo4xSVlXJ0C8np7zz1Nxq63jui28oq6yCkljOMoqIECuJWc7J+e/uujOOooRFKe88iUcA8/Lk2cF2bVyOJoYngfHAZcAiEVkhIt+IyArgW+ASdORL4IzUUL7amhWB6wE9ZZRX58znUoPG+crseZbTypLA9Uk4EjiYSdzUxeCBvzxzTmpuFyvFS9Pn2AlcT9LxujuSL9nLKI7nlaFtNjKKHsNyOPSzptcnj2R6/XNfT/VB4NbBvGUUPQ/LvXPQwZ/9/Bt6uszt1Pp6nv1ikjeB+5RRbARui+pJl380dY4p9d84p1xT/0XsDo/b0ZSglKoBzgPuRy/ydRB6z9KDEu/vB85PtAuEBglbbE6bPJRXV/PanPlMSS4vXF9Pz9nzufnE4+jQunVw8x7nGTScsLyqmpdnzGVKXWJudfX0nD6HW045ng4OS6/qMVy++Fzm5VTuZqMQszK/37DFll5v7NF2w+aEbds3oMWmdQz7LyivuTplZS5dv5n5bVrT13YGGu3WbrRfA+MXocOXcHp8jw+h5VeZ0fbaTdvYLPBDZc6QEcJZ0bYJet++oJSqA/4K/FX03qV7AGVKqfJc7AYmdL9euG80MIGDDxLPIR78lVnz7BonipdnzeOenidnNp8FgYM7iRsvybBpsx3112HT5/LLM09LtG/CBA4OxJi5n7HN366/lNtfepchP7s6lUyk7TYggesC29jP3X2Dbb56LDv5p9+af9mkbWZB4Bb7EhPmDnkk+cbWpv0198xwH8QHlELVNfoGF3lHgsRzIvIkQskUjbIy9U9Dq3eexIN19fScPY+bTzrW2Ut3I8ssCFz3s5dZvXPT3KbN5pbTTmCPNua5ORKr61wdygO09UPiDZGV+dzXU5i4dCXPfT2V31x8ludAgUg8CIGDTXYyj9OwXrijLV3gr12WkJIYLdq3ydwwQgqBCX3tzp3c9OGHvHrZJXTZ3YdKFotRXl3D78d+xd/OOZsOrZ1DK5sigVvx8sy5Kf11DvAr4D9AB7T++vLMedzT6+RQZRTdx3WSqZcvTpvDSXX13IZ+YPWAZW7PTprOwo2beerqi/0Re8gEXlZZxd1vfsCgn1yWTrM3zL+ssoo7Xx/N4J9ebvacHexnm5VZVlHF0HHTGAVc+/kk/v3FAurry2nRoiM/Oe1QHr+6Hy1axFxt5DMrMx6P886UOQwcM4+127ez7x4dkHgZ7//hdvbarb3znHIkcJs9BwIv3VXBjX8bwvQVaymviNOSamppTcf2JUCuW2UXp+QiATaJtiLTptHBHooKPDRxAuU1Nfx5/AQPq+aHKa/Nn8+09et5ff4Cs7mQHmQCBbHJw/Jt25nbqhVntWnDOSUlfAOcU1LCWW3aMK9lK5Zt3ebwh9swmzws37KNKegtWC5K/H9OixLObtuG+a1aMW7Jcr5ZsYZhU2Y5Xot8hxM+P3FGKs3eKZxw6IRpTPp+Fc+Pn+5i3zuc0PH+WuY+5MspXBbXD4z7xRXx+n2JEae2ri8vTqzi+D8/T31cpe24RKM4ZmVab5rlQao5wqTEdD5xpfjpwHf51cuLmb3yETaXj+O71cewaO1GLnr0ORSJORnCCb0eZEpJifn6GR+UGq9TSUn67yWWjogRQwTMv0d8woSFS9lV0Z7WVCNAK6op3XUoEDuRXODwULdIwhaXYX6GHOTwRKArUVsfZ+6mzYwG5mzazNodOxNW3J+Il1fX8MaCbxkFvLFgoWcIX2pSAQk8jK3WHOeRgcCt5p+87ALG3n4jo352HYgwOjHGqJ9dx9g7b+LJH13om8CzSuoxtTcT3m/O60OtUrwGxEHPTYQP7rqJD++5mXU7djIKeGnKbMqqqh1t6PHERoS5hhOWV9ek0uxfmDidsuoqEzEmd+oZBbwwYTpllVWOf9Bec3Cee3qQsooahn49JRWueB+KFnzHaBQt+Rj4nI3lnfjju5/6InAbiVvG94pGsZL/O1OS6fXfAFcDe9OCTxkNrNiwkZfHT3f2xo0k7kbgxvBEI4Fb5m+6lomy0opK/vveWEYDJZQRS3yu9J/jT8lxS1HHz1GRhC0OA75GP0cuBcYCbyT+L02Uj0u0sx6eCETo63buNO3s/efxEz1/1klMeH3BAlOa8mvzFtja5SWcMGQvPEg44Usz5ppCA1+ZOS8nAvfrhXuF+N33/mdcCSyAVIr2JQqGTZ3Fi9/MNN2jYZNnpW34JHDr9XVr50TEyUSeVJq9xQsfOn6aaX7J/TMzeuEeBJ786Cfv55CvvjElE71vuE6XEaeEQcBDvP7NCn8EbvHCpcSyFIDPcMJYSYxBY+ZQUT0A0FJTC57gR8TpB1xKC/76zpe2z4bJCzfYF2t8uRuBm4jeUBdLk/6g0V9wWeIadaY2db2uAFrxR+AAckWREvrf0eu0/BM4SCl1nlLqBqXUeejQxX8k6v+mlPq58chkOBChV9TV8VDi9cPAnE2b0l56AsaLXF5dzevzF6b2IH2wvp435i9kZ01NqDJKeU0tv3j/U8qrq13nkq2M4ise3NC3vKqaV2fOS59zXT2vzJxLecLrbYyszNWl5cxcs57/A56GVBLIA3V1DJs8ixcmz2RA4mFpsqysutpVRvEay7mde1ZmWbV5n8xkAk/SCy+vrjHX19bx/Php5gWp3Lxw40AO8eDJpJ7yqmqGfpVOpy+1XKe/UElLngQOoaqu2vzl7ELgNi/ci8BjVrI3finFWLOtlPQ6TaW04N88RmVibjVsL9tGWVWNuxfuRuAuMoobget56uKyyioGv/cFjwEr0XuzJa/XwyS99FwDFxXU1fo7mhb+CsxXSv1OKWW6SEqpXUqp+4B5wN+CGg5E6Hti39n7z+PGu5Lma/MW2FOoleJVBy8dfHjhLm7yK7PnMXXtOl6dMz80AreRuAuBg9lTe3nGPC7FHhr4yqy5/ha38pCv/JKqtfz3o8dwJdrrvAzzPbywvp7D6+O2e/TiN7McvGtvAvfywp2yMo3et3X8pJf+3NdTuSxurx86bqrLfTEP5ETgpkN0Or0xmeghdKqecUztpf+NNi1bp29agKxM6w33/mVhvv8HdtoTmAukvXPT3AQGf/SVCzln1sGNMoo+0hfM9BlNzqmkhIGjxqTu28+BK7Fzg7CRXCCxGC3atfV1NDGcCUzM0GYC0Ceo4UCEbv0B9TAwZ+Mm1u7YYWu7s6bG5J0n8WB9Pa/PW0B5dXUoOviO2hpTmr3VS2+MtPoHHEIDX54+hx3V1XYC15PMSOBe0oa1zFi+pmyHyTu/3zL0w3HFSqUoNZQ9UFfHsG9m6nsUEoFb+wI27zs1fsILX72tlOfHT3OsH/r11ITWH5zArQ8yl27YpBN2dmvHySUlPIuOAjJCe+kjuPqUg30ReNZp9aZfMfr93RedQLvWjwAbTN55Eo8qxbMffkVZZWVgHdzLC5eSmInEkxWluyoZPOpz7q+pZSX6AftDmPEw4aBIJZfWZNaj9k+0C4RAhA729N7ewJ++Gm/TwV+dO5/TXdKUe8bjvDbf2Uv3Zln7DU4m8iT11VfnzA9HRrH2c3mQCWkZ5eWZczzTxodNm5PokLuMYi13s/Hbdz/hTOAVoCfOKdqnof8ArfN9YeIMm8302O4yit81wp/7egq96t2v16/feN+z/rmvpiZsmu+p/UvIPa0eYOidNzD7ifuY/cR93Hbe6fR2uU69UXTZo70/GcXDC/ebVp/se+0ZJ3LOsa1pW3I8val2nNsJNbUMev9L8xwCyihuBE7M3HfgyDGpz/n/g/yl/hfv4lwzgWtFxHFbJxE5EbgOvb9oIASKQ1+I8+7ee5XbPfQV20uZ26olZzm0B4ht265fuHjeSXh9+zqm2c+ax00npBN4PM172M4mqWf5lm3Ma+W8KztAyy3bbATuOLbLvJzK3Wwkz219aRlbgEmJ4uOd7AKTYzFGWOLPW2/aaiJi+xiZ5+KVlfn9hi3Mb9OKvs5nQIut21nVphV9XcKZ267flPLCzWNYSdI6T1tB6uWyjVtYvFs7Tq2uoao2/UurTcsS2rduRcf1W9LnJNZxzV+26fE9PoRen4fE+1hJCW/85iec0v8RJm+u5zhiaMpsg3biFKhSds35jvt/erl9bhKzmrTPy9TeuW/yM7V45VrmtW1D37awtXwngvPnKufUf/H++2/CeBj4GJgkIq8DX6K3oNsPOBu4Af2pDfxDR5Sy7lHqjmP27qzevvJH3gYbMKnnv99Mo3TWPF4yyDo3tShhrxOP0wk8to7hErgblwKU19Tyy5Gf8NSV/ehgIMogJO5K7E42fJK93y+FhsjKTNsVwx6ZV7NHu3aW9ta5hUfgyfHd5+pCepa29nNy+Tx5fMHhZi8x/96/f4ZZyx5Chy5aMYITjvgTE5+429THNu1MJO5A4IkK53Kx20ueU5tzb5ihlMp6c9Hj9t5TvX/V2b7aHjpkVE5jNTRE5BrgWaAjpoubCmW8Qyn1dlC7Oaf+N1ZWZnl1Na/OdkmznzWPm05MeOlZkHgYe2W+OGUWk1euYdi02fzqzJ52OyF64ZnaZk3iPgjcqV9GAgfb3FN7ZH41ld9cfHa4JO5B4Nb5NlZavc2ew/zXbt2O+65kx7J2y3Yto4TghScq7GUu11WcvhByRSxGSdsm98DTF5RS74jIJ+jnxyeSWJwLLbOMVErt9OjuisBXf0ddNbd/9Ak76qobNSvz5VlzHfXqXUDL2jqemzbL98PMsOPBy6trGDZtNscBw6bMoiwRYldeU8Mtb4yivMawBrnDtXDT0nNN6imtqOLCp1/i7P+8zWEDnuScp97h3bnfoRJaZaYHmUb5RSGMmP0t5/1nBIfd90/O/dfbjJi1AKVc7q9t7nqQeFx4edIsnvpsAu2Bp8aM5+VvZqCs5+HwMNM1qccQzaF1X/NhfkBpzso09iurrObqfwylrKLKpoNnlZVpsF9aUcmVjw7SDzJj/qJRunTeC3iPlrQEPqYle6GDBgHmcWDnvRx18NJdlVw+4F+U7qrEqqFTUmK4NjFADPOXhA2HeSU0/rJdFfzot49z+xNDaH/u7ex+zk9of87/8Mv/vECukMRnrQg1dACUUjuVUq8qpe5VSt2a+P+VbMkcghK6wCtz5id24jE81MxA4LqJ9xPpoNEoy7fqNPtTW5RwCnAOcApwpgjbgK+/X+VK4EBwAteTdPwDthLesCmzOKw+zlzg0Po4L02bDcCLk2fq9Pqps2zXIh8EbjziccVVQ95m8aatrNxSgULx/aZ2/G7kam57fQxxw+KnmaJYlIKfv/wJvx2xiu/WHU9dfR3frjuB37yzkp+//AnxuHIlcAzRKArFLS9+wIB3prMXehPFjsD9b0/nluc/0F8OfgjcEs1hvXbZhhM+8+k4xi9cyrNjdGiuHwL3zMo03KPBH37FuAWLGfzBVyaitLYzlt9z6cm05j6EOlpzBcJ2WnIjUEVJ7E/cc/mp9g+wCANHfcbXc79l0KjPnAk8ReKJcocoGVOsvOEL4anhH/L1rAW8+vFCWqD19BJ2MmT0OiAWeJNjEwTH83E8mhhE5FgRGSwiH4vIf0XksDDsBiL0+rji9bkLdYjg3AXp/TIdkIsXnvH7ISY8efmFjPrZdcQRJqO98zFArVKMBjbt2pVK5gH/Xrhlkq5euFs0SnlNDS9OncVKpRgFrFKKFyfPYk35Dl6aOlun10+erXeRt9pwIHDreE7X162dkbhenzmfVVu3MgQoYWt6d/bat/h6CYye+50j4Tml1b87ewFfL4GKmk8SKeg6Pb6i5lO+Whxn5JyFyQtoOOz39N1ZC/jquzrq4+VsR6eNlwLx+A7GLqzh3Znz/RF4Bi88m2iUssoqnv10AqOAZz8ZT2mlORzWjcD9ZGWWVlTxzIdfMUrBMx98mfbSY97RKKs2byVGHUOAGDWJezgeOAXFVlTM3rd0VxWDR41hlILBI8dQurMCG4Fn8MKNBG6sL91VydPD32c0EKPUkvr/K6BVS3JEMYYtikhPYBrwC+AC4E5gmogclKg/TEQqROSWoLYDEfrWykpbiGBqknkmcCeie2nGXFMSz/uQWprAmMzTkFmZL06eyeH1cX6UmMelwGH19fxu1Kem9PUXv5npi8CDeOG6nXM8+L+/mMZltOBlsO3OXlH7IM9OWOxK4Fa54dlxi6ioeYAW/DeVgp5Mj6+oeZDBX31L0gs3H2Y7z3y5kNrarnSm2rSkRCeqqKk9mkFfLMiKwL1kFK+sTKOMMviTcVwa1+d2cVzxzCfjMsoodu/acPIGIhz0/ljTZ2HQe2MdCTxlNiGjPDn8Y64Ah3tYSzz+FANHT9H9DV74wJGfphYcuyQeZ9DIT30ReIrEjb8eTF8YMf771gdcXK+v0W621P9fkHvqvz8yb2qEDjyKfvj5I/SCp7egNfT7AJRSy9GrdFwa1HAgQt9WUWlK4zful+lo3KeMYu/oTHSQ9tR2VNfqFPtEEk8pOnHmoUS7ZDKP3gPSpxduGscfgRvLy2tqeHHKLFYolUrgGQCsVIoZq9fzS2t6fWWVbaxcCNwxqQedpr19x3buooZvSMdCPULSw9uLdaVlnnMw3pd1pWXAoZYU9HR6/LrSMkcCt3rZq7dtJ8aHbEel7tvDQCmKEt5n9bbt5pvm5oFbvHDrDXcl8Jg43v/SXZU8+8n41FIAD9TW8uxHX6e0dDcC96ODl1VW8swHX3J/jU5XH1BTyzPvf0HprirzZ9Q4LxHmLFtNXX0dt4PDPVwEdGbN5q0mGaW0opLBI8eYxho84lNKd1XYv9yMXrgLgRu9/9KdlQx++0MeReewVhnm9DBJL93yqyYwFNTX+TuaFk4ARiilPkyk+r+C3mP0HEObeYl2gRCI0DuCLQXbuF9m1l64DwK36uAvTZ9DL+Wdrp3cM9M8VvZeuFOZsfzFb2ayX309l1rnARwMfGC5dsn0ejebWRF4zH4+Q8dP43KEh7GnaGsP724O6LiHA9lZ74se4IA9O9KCv9hT0BPp8V327OhI4NZ5tonV0zHhnVvTxvegmjYl9WkC91rcKsSsTGIxBn/8NZequO2zNPijr3yl1ds99fQxcPRY+1IHccXg979Iz8GS1BNXin6//RtXgOc9PHDvziYZZeA7n3BZ3LqsQ5xB73yMm4ySIm4H+Ua3jYHAf996L3UeVyfumT31fy25QEQoadPG19HE0AJsF2cumiaSWI/OFg2EQIRutf5gfT2vz57Pztqa3Lxwaz8fDzKXb0sk8bRtw5ltWjuma6e89JranGQUa7mbje82bGKlcpgHsIbkahwa2ktPp9c72UyPHYzAjYSXTK+/A+WYoq09vG+59qTEmvu2+2IeSGJw0+lH0JKRthT0v1BJK97lljOOSt872zzT93af3UrY4TCnh9EPSPfZrYUzgfvUwT0JHEzlyb5lFVU8+/E4BtRYlhuoqeXZD7+itKLK7snaPHVnMiyrqOKZ979IecxJDKipZfDoz3UUigORPvHOJ1TX1KS8c7d7+JPzjk0RdOmuCga/+6l9rOpaBr/zccpLd/TCHQgcARWLoSTG9l1VDB7+AfdX1zAXzUwPW+ZkfZ8VxMGpcDuaFuYAx1jKygBjjGYnIPCqY4EIHRxSsFWcl2dpLz1sLxxwfZD5n6su4uv+P+Pr/j/jmh/24JySmHvK/dTZORG4l8dsPLcf7L+v6zzORn/9muZWH9cbOmQho/jRz0X04la96uP8EfcU7d7AJ3O/NXnhycNJB9+4vZwzxcWWKNZvL0uZ8HqYuVvb1p5z6tC2dWgyir6GZnI3Xb/EMejDL12XGzg1HmfwB1/aZBS/afUDR3/uvjREfT2DRn+evtAGHfyJV9+jD2S8hx+On5aSUQa+/RG96utdxooz8K0Pnb1wBwJXEkPFWiTmFWPgm6NS1+hyjznlmvoP9r+FIglb/AdwoYhcaChLJdSISAv0D/t51o6ZECixaJEIZznsiVmydZszgbsg7KzM5Vu3e6fcb96atuMyL9dyp4Edf1UIyzZvZW6rVhxfWUVH9Ie6HP01K7ik12/c4vyBtBQ5zSNTQg/oVPb5bVqxfVclCvfU/723lpIkcPMY9i/f7zds5rt2bTm9ro6q2nrqVZwSidGmZQmtW7Zgj/Wb0/fY+sVnwLot29mMXk7CmK+cbLX35m2p/tYvYRPc/qC9ro+LvaVrNjKvbWv6WtdFSjTZbdW6tIySNuBkNj1+onDxqnWplHkntF+xJnGNzNdM1dfxDfoaCS73UIR91m1KzWXxirXMb9fWJMraxpKYaShlvPnGX1jpyWjby9Ywr31b+tKWraXlbHWZU86p/7j8/eVm7y7gt2jBYQHwa6XUeI/2FwJ/RnvU1ehVEn+rlFqcwzTKgbeAD0TkeeAr4NDEeFcCdwMHAn8IajhQ6v+x++6tRv7kSvcGWZB4GFmZ6fYBydqFmO3j+Sf7f42dyJaJMxlmWCHw5hYt2Kf3ydx73hmGsa3eY2b72abV22EhOh8k7tY9170y3fp6npeXR2aR1dzmZbJnIjKXL44gBG5r7zKW6dqIvSzR9vS7/sGsJQ+STvkvpQ0HMpJdXElruh5xFFOHDvAe23SO6ZcpEjfNy/bCYMc8vzc++ZK7H36Fy+rKGG6Q4K6lDW9Tq5Sqy9p9/mGXfdTndzgtc2DH3g8+kzH1X0SuA14F7kIvT3sX8DOgu1JqlUP7w4BvgaeAIegfHX8HDldKHWlt7xciUo++C0nyNb4GqAP+qpT6U1DbuaX+N5QX3tAEDq5euJfNssoqXpw0kynW5V7r6ug5aQa39TnFvMFxCF6447naysIjcFt9oRK41X6hErix3GVev7ymN/f861F2VV0CtDHtWPQjaqjbv62dcMMicFebuuxfr45B1W3nMUtEy1+o4h3P39U+oBQSDzWC5V5gmFLqucT7/iLSDx0H7uQNnwS0BP6glKoHEJHHgbEi0lkptSXLefwZ011JoRr9SOILpdT6bAwHJvS1O3ZyzWsjOHyvPRl8RXrhqcYg8LhSvLdgMc9PW8ba8jLidbv43Xk9+ckPexATyTuBW9s9P3EGveIqpScmkdQtn584nXsv6J3BfmYZJbMX7k3ga7aXccHfhjDm97/goE4dgxG4LtC70E+bx78/ncmytcs5vMth/Lrfifz4tOOJGUnWjfQs57Zq63bOffA/fPHwrzhk7718yShhEnhcKYaPm8bT709n7ZZttG3dmp0V29mxs5T99u7Cn396AdedfVr6fMIgcGN7ty83iXHduWfw7rh5jJ3Ri11V/0sLnuT3VHIR8BiKC6fPpXRnBR077G4bypvADW98Eri13apVy+iN1tNfBO4AnkE/M8pZQ48JElIEi4i0QhP0E5aqMcDpLt2moxXT20RkKNAOuBmYlgOZo5R6JNu+mRD459AfPhlLeXUNc9ZvzHmvTHMn5wdZjg8hASVwx7tf8cdPNjJvw98or/gxFTVVPPzJAu4Y8RVKMthweUjrO3TR4YHkss1bmde6FWe3a2s75rduxdINmx37eWVlWsd3DicE61NT68NpY//+L71LaUUVv3xpFE7RKJmyMuMobnx2NP/76hK+X9sShWLp2tb876tLuGnw6MQ6LMGiUe5+9k2276rkniFvej7MdHqQiYhrViYe0Sip81GK6//+Kv0Hz2bW939mU9lXrNz0D3buLAVgw+Z13DN4Jj95/GXiej0Cy2EYK3VfYuCUVu8Wq566Jvb6WEkJwx+5i4G/6cOBe/2Oy6nkffSaqx+gQ2AHvv0hrg8yTSGjxhudGN/x4ah3KKOKlaAkRusYTKaO4yjhrsTtuhM4jhJ24V/OdYJgD311O4DOIjLdcPzCYq4z2reybqO0Eb1srQ1KqRXA+ejAomp0JMqxZJHw01AI5KHXxuPMXreF0cBVwIvTZ3PTScfTsb37t2i+ZJRR8xcxYXkJFbVTgCpacDMjUVwZ38W477XnfuWxPzCM5/YrIZgXbm5nPodBN12ZsV9WOjiEJqOs3rqd6cvW6Hu4bBWrt2/noE57EkRGGTFlLl99W09FzXDa8gPeBa7iWyqqv2PswusYMXUeP+71Q98yysrN25i6eIWe06IVrNxaqr10rN6qh70cZJS3Jkzjy9lV7Kr+huRGzDCbGCTOTVFR9Ts+n/0P3ho/nev79nK5NmIvc5uXlxfs8DoG9Ot1Mv/3xBB+h84XHwXcCHxaXcOFwz/grp9cQcfdE35xAB3cu16/Tnv65vPdXLkTRXfgZNryUoobdvJTYDg5QfCUdS3Y4nP5XOu3jDiU6QqR/YDn0cm5bwC7oyMy3xKRc5RScb+TaygE8tDXlJWb0nt3q4/zyqy5pjZW5yU9UgwnLzzbcMLnp35PRe2DWDXFy4hTU3c0QycvzeiBu3rAru3CCyf0moO2a/XCY6bDKZzQdFg9mETXe4a9a0qz/+WLI+1euGVs63wHfTafiuoHaMWt9mUEqh9g4Jg5Ng/cK5zw7sGvm+Z0z6DXnL1woz0/e2X6DCd8+v1p7KoeQJrMoTXXW1LZb6SiagBPj5qc2Qt3mpfx10XArEwS4YT/HT6aXvX1vIJOousHXIR+yndafT2Dhr9n9sLFyct2uG7Je2Jop2IlKS9cpc4jfb5KEnWqI/AQrXnJcr1eIvfUf/vfXA5hi1vQoYFWb3wf7F57EncDuxKbOc9SSo0DfgqchbtMkxEicki2RybbgQi9orYulTDwMLANeGHqrJz2yjSVexC4tXx9+Q702tDWXdAracmHrCkrs43lJaNkReAOJO7WL7iMYh/IS0bRY7hnZepU+zKmL1vNQ4Z7OPX7VazaVupJ4NbzWbt9O9CJEiamPg/GZQTWbiu1E7iLjLJyaylTF68wzWnKouWs3LLNPSvTh4ySIqAkiRudDOOcRFi7ZRvmdcbfJkbc9FkvIQ6sZc3mbdgIPAsZxUyyzl88pbsq+dH/PqRDTiXG4uVrmNO2LUNIJ689iN4lYW7bNiz6fiVhyCjK2C5B4ikCl1hiETAt7+iAjDeJ4ZT6v41cIfE6X0cmKKVqgBloCcWI80lv6GVFOwzx4Qkk3wfiTguWYQ7bD3J4ItCk9sSe3rtbfbh7Zbp65hYb+3foAMx13gWdetrE6jOPFQKBOxGec5tMBG4dzOqB+/fC3XTw/i+McEyz/+UL73ifj4k8Y3TZa09acYct3TuZgt6lU0dXAicWMy1udfeg1xzndPfAV90JPOS9Mrt07oQxjzfpnVvn1Ipfc+DenXJf3MrkEduzMlVMk+Z/h7/H1zPnM2j4+yAxXvr7H7n5qgu5tnUr09x+3LoVt1xzCS/94wE7gcdKXMc2ErjVC9cEXmIj8JRGn2xPC1rzjkvqfyk5QQRp1cbX4RNPAreIyG0icrSI/Af9M+IZPZw8LiJfGNp/CJwoIn8SkaNE7/X5IrAa/eWQLYYBX5PenWgsWtIZm3ivgHGJdtbDE4EI3foDyuillxtSjLOVUbwIPD1jTYD/0+tI2rb4Ey34l0MKeg27KstNu9a72XT82ZaBwJ36BZZRbANlL6M4kXhiUqlj1bZSpn2/KuUJG+/h1KWrWLWtzJHAsZyXxITrTz+CEhalPLIkkino159xlCuBG2WUlVu2MXXRcsc5TfluOSsTiUX52uw42bf/Vb1o1+ZR9BJTZu/cOKcSFKd139dO4FYv3HE843U1E7jpYWbiQWbpzgoGD/+AUUoxePh7lO6qoHRHBYPffJ/7q2tMcxtQXcPgN0bppXGtBG54nSLwTDKK47ycvPYYsN3knRuvVyhwcn5cHSJvKKWGA79Gr5k3G51ge7FSamWiyf7AEYb2Y9H7YF+O3knoU3TUSz+lVC55U39H/yT8J3CQUuo8pdQNSqnzgIPQmaTHAX9TSv3ceGQyHPhng1PKcZv6OMOmzc6KwL0iR/QM7TdORLji2B9w0B6b6EOlY9rxGZBdWn0sM4Gb/pBdCNyPjBImgdt08JKYaez+z7/tmTbef+hwRwJ3Wtzqo2nzPG19NG2uI4Fbyfmu/77qaeeup18iWxnFi8Ctmzxcd3ZPzj2hPe3a9KI119HHY04vvfe5ncA9tGjT2Kl697T65L0d+OZ76SV243EGvTGagW+M5qS6Om5DLwZiSuevq2fQ66NMY8eV4o1PvuL0n/6RQ86/jTNuuI83Px5LXIGVwJV1Xo5ee3LOkqiL0QZcr1euYYsSLMrFF5RSg5RShyqlWiulTkro4sm6W5RSh1rav6mUOlEptZtSam+l1GVKqYU2w8HwV2B+Qps3fTEkVl+8D532/7eghgNFuSzEPXV82SadXu8YIeLyDWrV0AHHb1undrGY0H3vDkzeUcoJdfXE44pYTGjbooRWLfRp+U2rdxojnKxMBxsx63uLTWMDa1tbZ/fxrXNdu3lbKs3eCXtv3JbywM027eewdtM295R90baSc3ONRAHWbtzKJnGf0z4btmgyTJ2Ty7yMFeIwnumcLM4CEIuV8Oaff8FbX02m/+Oz+QbnzzlATCnTnFLjOY1rGM5vWj1A6c5dDH5zNJMTnviA6hp6vTma3iceyxQFlcCZJSXsuXt709zaf79CEy06rv763/6TL6dsY1flH4Hj2LRtLvc8+hgjP5/O60/eR0ys80qegyTm7HBOhjIlQgtwvV45p/4Lvr3vJoYz0Y8+vDABHdYfCIEIvcd+e/P+z64zlbmStU9iBnyTuNXmwBsud2mXmcT99MtI4JCRxAMRuOM8PUjcZsv4K8jcb+a/B3i0Ndr08HYSdbMHP2S/fm72HAlDt5n7wuO2cmP7rAkcMpK4dV4x4Ppzz+An5/U2lNtJzPG1A4FD9kk9A18f5bDEbpwOB+5PTX29Dg2sr+fTV/7LwV32N/VNTuKtjz9PkPlE0tE7R1FReQlfTD6dtz/5kmsvvcAwFwuJO9w3ZSF8gP53/Jy5L7zC0zX2hQGPdwkHDAI3fmniaE3mEKD9E+0CIXCmaGMSuHu7/BC44xwKlMD1WC6kZ2kblMCdxsqGwB3n1lAEbmxvnJerZ+3keTvZShf59sI9vhRKd+xk8BujUt55EgOqazjh7fdT4YpXAHc8+A8+euFfpkkk5/D0a58lPHPrA8M2VFTez9Ov/I1rL73QJ4GnTzRpXyXeL1myjPnt29PX/GMBgF3bS60RIsERL7hQ7zAwE7hWRP6rlJpurUw8fL0u0S4QAhK6yx+oeTL2wkYgcKe+xSaj+F0bxY+M4lQnLmQmMZ36/9bEmfz3w5ms3bqNLp324p5LT+La3ienUv+DkHjpzgpu/vsQXrrvDjru1i5VEY/HGf71VJ4eNYW1W7bSpXMn+l/Zk+v69iS9Bn8GAjcMlJHASSwD8PkEnn5nAms2beHAfTrT/8dnct15vfW5ZSGjeEozhrKBr400Ld+bRAnQM67oknj/MHDcrHmsWLeZQ7rsZ7oMSmKs3bAJcyimEceyduPGlDxjlVGMxoznmCTxNOHHeG7wv1LNV61dx/mX/4TPRr/JwV26sOdRx89xmYA/iCCtAjupTQEPAx8Dk0TkdXTS7wZ0jPzZwA3g+Kw5I4IRujiRVSN54T4I3KlfwckoHgSuq91lFFdP0zqnHAncai+u4Cf/fJMv51azq/oh4Dg2lc2l/7OPMGryYt743Q3ESow6c2YvfOB7n/P13O8YNPpz/njjFXqceJzr//ISY2dXsKvqT3qc0rnc8/SjjJr4LW8M+DmxFs7jZPTCXc41rhTXPTiYsdN3sKtqgB5z+1zueeJRRn49hzce608s+TA0dR62F55euLnM3G7xspXMa9+OvphRvmMnsfp69k68T4YG3jXgL3zw0n9tdrvsty+bts0FjsKOeXTZT0frOMkoVi/caFc5/DJJnvc9/3c/pWXl9P/NAEa++ZLDuFmgCCUXpdTnIvITtI5+I3CTsRoduniHUuqzoLaDSy4RgRcngVvte3i6b42bliBzY6r8UVRUX8IXc3ry9sRZXHfWqVaTrjJK6a5KBo/6nFEKbhz1GXdd1Y+Ou7Vj+PipjJ1dya4qyzhVl/D5zF68NX4a159zun3uAQjcSrbDx4xLkPkk05i7qi7h82m9eOuLiVzf7yzjSWW0aS7zIElg2L+STlm63Yp1mzjpkp8wl3SsN6S99JXrNnBIl7Qkq0S4+6aL+OVDj1FReQlm2aWKdm3/wt23XJEm7iwJXBl099Vr1zJ9xmyt70+fycp1G8gZIs6/xosASql3ROQT9I6CJ6A3iS5Dh0eOVErtzMZu9lfLJZwwYyifa7vMafXWvm79AsWD+wgntJ1HLuGElvGt18ItoQdbO2uIZCx92O5Vus5POGGmrMz/fjjdliqv0YaK6gd4+r2p6dN1Cic0HrESBo4cYw7RGzUGYjGeHjmZXVX3O49TNYCnR0xM/dE7xoNb9vnU48VMY6fPERB46u3x3mMO/8pwoxsmK/PO+x/zDO+8+4+P6VBCSYcT/vjSCzmn1z60a3s6MAJYDIygXdszOOeMfbn6kgtREiMuJahkTLlYU/6T10Wfr6IkcQjxRPhiPKb/73/vH0zLN/z63vvsn8Ns4PSZcTqaIJRSO5VSryil7lVK3Zr4/5VsyRyyWQ/dh+SSTy882uQhaTN3HdxiPHO7RPnaLdvx0mfXbN2WJi8Hm8axSisqGTxqDJMNO9P3evdT7rqmH2s3b/UeZ/NWTeCOYzh5xMZz1P9ZdfC1mzZ7j7lps+WXhpeHni5zW9zKPgf7vNasWcdmEfeQ01VrTaGUSgQBXn36z7zz0WcMHPZ31mzYQJf99uPuW67i6ksuQEpKUiEodi/crpsnpRnlcE6r16xj+oxZvJAofxg4bvoMgFYuU/aPwlv/KmeIjzVZklBKrRS9JV2X5Huv9qFILpGMUvgEbrOXQzRKl857sanMXZ89sPNezqQHtmiUgSM+4bK4PURv0IhP6bJPZzaVeoyzd2cX4k6W2YkSvMMJu+y7N5u2e4y57z52Es8go6TrnQiejO3nfDk6YzSKcrAvJXDNZRdzzWUX275clLk5TjJKur31nMzn+6t7f+e4fMObcBi5QARpWZQPRZdhufoeiAGHo3dOEjKoKoEIPf2ZypLAk9PL0Dcf0Sh5DSe0jB9KOCG4k7jXF5wrOXvMIWBIYf8rTuOegY9QUeWgz7Z5lP5Xnp7u6xFOWLpzF4NHpr3zJAbU1NJrxCc8dOdP+d2q9E49xnHat3mUX157pv0ntw8vXI9vHDH9pv/153DPXx9zPre2j9H//51vHjMML9yxvcM5OXjJTvadolGsY1sJ3GjXTuCGjsm6xPvVa9YyffrMlHeexMPAm2HsE12cGvow/BM6aG3dV5/AYYtGHdxWW0AErvt49CdN0mUVldw29C2G/uJa9mjX1lRnaOw6fiGFExreuM8hx5jw6846lZETF/LF7F5UVA1Ar/k/j3ZtHuW8E9tzbd/TEtfHTODWcQa+84lpZ/ok9A5P9azfvJVzTuqQ2KknPU77No9y7sl7cO25Z2DO2tT/5RJOeN2FZzHyy1l8MfV0KirvT59b28c477S9uLZf39T1yEVGsXrhfrIy3ew3BIEb+xq9+F/9729N+n4SYaT+N2V93AtKqVsDtt8I+OqTs+RSDDLKM19MYsKi5Qz5YjK/+9G5ycaeYxebjOJF4NbXsZIS3hxwK299PZWnRz3Ems1bOXDvTvS/qhfXntVTx2r7mNfilZl3ph/+yP/x1thJPPX2o6zZtJkD99mbX157FteeewaxEgOpecgoQcIJYy1KePPv/8dbn43j6df+wpqNmzlw333of8MF/LjfOUgsKVeYCdY8B8t71/b28e1euN2+lcCdxjaedzYyiqmf05dAomzdylVsSej7xg3nRYRdATagd4LgzB0R3CEqwEU/vst+6tN7brJX2PgsOIHrNsG88DBklLKKSk667wleranlxtatmP7X32gvPR9ZmdAgMor9gWcAEnf78vDQwR3LHbzMQsvKdK93IsrMBO7cPt0gWxnFM5zQNH44MorxtZXAjXM3tikrK6PPKafzl6p2PN3yQP7wjz9ww6+vneFzFyFHnHTEQWri3+711bbtj+/NaaxiQeDEIqBJE7ilgGc+m8SliZC5i+Nxnv1iEr+7/LwmReD2dll64WEQuMFOwRC4yWZQAk9PpKEIHBpWBze+9kvgoD3y1d9tYP74xXz4yhhOrL6At6WE/erifPjaR4SCIkz9DxLlYkXoUS7GNcKtaCwZJaWB33Yte+zW1lZveGOqK6uq4rkvJjG5Vu948kBtHb0+m8TtF/Rmz90si1MUkYxSuquSmx8fzEt/uJOOxhX7MpG4n3n5JdHE63g8zvAvJvH02+PSafbX6jR7ibVwmJfthX8v3KFdmNEout59/NxlFDvh+5FRysrLuf3uX/PsoP/QocMe5vFwIHGHXwbJNts2lLNgwmLmT1jMwglLKN+8A4BKdnCu2sBkNvFnNnPRnBgkNy7KFiLQKvfIxwJEkCgXKzwIJkcNvVC88KQG/uzYb/j95edZJ+w69jNjJnKpZVW7i+Nxhnw2kd9fdaHFjAuZNsHFrQaOHGNIs78y0T6DFx4SgRvbxuNxnWY/rdyUZn/3Px5l5NfzdJp9C+P1DSijhEDg5jIPLzxbAneam+mjkrsO/uzQYUyYNJkhQ4fxf/f+2pXAncap3FXDd5O/Z8GExSwYv5h1S/T2mx0678bRZ3Tl6DO68thTf+K41QuJofga+Ay4sEZ4Cwy7tGcJ6997cWAY2RO6JwKGLZpTcQtBRimvrGbo2Ml6B/QvvuGOC3qnIlW8dPDSikqe+2xiyjtP4oHaOnqNmcgd/c6iY/u2LufUdHXw0p0VDB71mU6zHzmGu67uR0fjr5EG1MHfHDsxQebmNPuKqkv4fGoizf7Cs/wTuMM8nAk8PZFi18HLyst5/vlhjFKKnz4/jJ/fdht77NHB5oUnbdfVxVkxbw0Lxi9m/vjFfD9zBfV1cVq2bknXUw+n97Wn0b13V7oc3YVYLMZrr73GitXf8iqKC0D/HQKfonjbnm4bDCIuSYFNG0GjXIIghCgXhwueRxK31pk0cKVSGrjTfI1fRs+OmUjPuPOqdqfW1/PsmPHcd3XCS28CMkpGIgWICQNHfmJOsx/5KX+8+WrvefklUeNrR7IyR6M8PXycd5r9m49x/UV9nc/FjxeeQafOWkZx8JI9vfAsZRSnrEwrgZv6OXzpDHnuBS6Lx1P3e+jQF7n3N/em2iml2LhymybwCYv5btJSKsorEREO7tGFC/6nLz36dOPIkw+nZZuWpnHiSnj6n09xBopXILW070XAq4QRhE6xeuh5QzjroTdSVmZZRaWrBt6xXVvPcMKl6zYxv00b+uIAgfZrNjgTeRMjcENFIpHnM3Oa/YhPueuai+m4e/rPL5fFrRJDpeAVTpg5zX6LffzG0MFDJXAIO5xQOZA7EqOsrJwXnn+RKdXVADxQVUXP55/n2mtvYM38TSkZZcvqbQB06rInJ118PD16d+UHvbux+167mWzGlf1vf8e2TXxDjInUp7bZfhB9V6ttrbOAgwrQ1FFQqf9A42ZlGmw985mHBn7lBXaSNbx/4dc3W+bQwDKKTVOPORaHGU448N1PuCweN6fZqziDRnzC/T/7sfvcA8goQaJRuuy7T4Y0+71dv8SicEJ7nVVGee65oVwWj3MwMWbTiRnsy6lV+/HHM58AoO3ubfhBryO58BfaC9/nsH0QEU8CV5ZzqWvZmbr6nlzFxxyZ2Kz9SOBS2vJ2joqLHijnTY8KEYWR+q/NJeeRPxnFxmoOWZmlFZU89/kkVw389n5nsmf7du5zasTFraztw/HCnb/0km1Ld+5i8LsOafbVtfR652PuvvZSk5fuTOaZ5Aq7F+6Vldn/J311mr3jEq+P0f+G9JdyoWZlmudmP+d8yCiZolGUUnw7fSkfPjOWC2pP4So6Uy0tiKk4h8W3MaHlYh5/8W8cc0YPSlqUmPpbSVw5nEs8dZGhpCQOfMxjCTJP4i9U8g5V5AQRaFmUUS7D8E/okM/U/+SHtaEIXFcbCVL3zaiBfzqeP1zTz2CzeRG41cbAdz72SLOPM/Dtj7j/1uvySuCmPiJc1+8cRo51TrM/t2cnftzvnPSuOoZJNDcd3Fc44cZyFk5Ywvzxi1kwYTHlm3dwIN3ZSjmns4Kj1Ua6sZm21LFaWvHl1E857qzjAxO4ta5Du5b02FWbn9R/xPvvtomisFL/U5KofwJ3rHeRUXSV5b0D6S1Zu5H5bVrT1+VLq/2ajY0ro4h1DIexQ5BRTO2N87LYW7wiQ5r98tW6f44yiu18PCScWEkJbz7xW9769Cueeu1x1m7cRJd99+GXN/bjxxf2JWZaEjZLGcVh/KYsoxjDCRdNWcb88YtYOGEJaxfrDSV277Qb3Xt3ZdbSiXy3ZiaLY9WMTxnrkHrV8rvvUCo5nv1crCTuOF8ldNmvE5M3r+c4OmG6OSgq2ErOKEINPZ8IlPr/w4MPUGN//z+6Yy4EDo4ySup9c8jKNL7OlsANNjKGE5rGy8IL90vgxj4Zxm6OqxMaXwch8Hh9nGVzdTjhggmLWTpzJfW19YlwwsPo3ucHdO/dlQO763BC03hZeOFuBG6ti8fj/PKO25k0fjmVlX8g+SurbdvHqaycVapU/Z5kiZO6HqYmDnrYV9u2599UVKn/InIm0Fcp9VCQfoGXz3Xb8LghVifUbY0k17xlFN/hhA0oo3i1i3Rwe52XjLJx5daUhPLtRB1OCHDwMV244Naz6N6nG0eecjit2rTKSQd3na9ymG/yvUo6EiX8a/BzfPz+CF56/ik2rl/DvvsfyE233sl9v77te3JFEW5wkYSItAf2xDmj9grg1yLyEom7lCnCBXJI/dcTCldGcWrb1LMyHccqpMWtXL6gm25WprG9DxnF9FFx9o6Nr3P2wh2+WJJtdpRW8u3EJalwws2JcMK9DujIiRcdR48+3Tj6jK7s3ml3U38/0SiQvReeaqMsXwYJUxITLr78Wi6+/FrTl8B9v77NZiMIlAiqCB+Kisjh6FD908DhQqeh0BExSWR8oBB8PXQ3b7ARVyeMCNxlbCeiNNY3RwI3jd8AMooHgdfW1LNkxoqUjLJi3hpUXNFmt9b8oNdRXPA/fenepyv7Hr4vQcMJIX8EbrwWZi8eW1luEPO9Lx78CzgVGAesAuod2vwwcQwLYji4h+5C4pGMYuibpQ6uX3qQqPF1JKM0ORlFKcWaxRtTafWLpiyjprKGWEmMI044hB/96kK69+7KoT88lBYtcwsndJ2vDxnFaMtK4MZ2xqdv1rJQSF2wOX9FgtOBN5VSP3VrICIPAj9USv08iOEc4tCtnnUDErjVflPwwjMRuLGN63hOttIvm46M4kDSnl643X5DLm5lHS+IF7594w4toSRWJyzdVA7AfkfsQ5/r9LooXXseRbsObU39/XjhcdNFtozvSMAONi1euDlEwvwl4EXgmcbJFqo4CX1PYHE+DGe9losXgRvb6bYRgadfZkHgprbpouIl8HSfpkbgVRU1fDd5WYrE1y7S4YS77dWe7r270r13N3r06cZeXfYy9W8sAjeYSl0PLxklE4E7tcsJxSm5CM4yS87IYk9RH0RFCCQeySi24SIZpfBklHh9nOXz1qaiUZbOWEF9bT0tWreg6ymHc/pVp9C9TzdbOKHCLksUkoySSRu3a/DY2oeC4kz9/xkwM0ObUcCKoIaDSy5JDz2PBG6zV0heeJgEbnydTwJ3rXciaXeC9UPgut5yTh4EbrTXFAgcYOMqvTrhgvGL+XbSEnaVJcIJe3Th/J+fpVcnPFWHExpthOWFZ0/g6esRVAd3Htvyv8OXTk4QQbVsmbudAoNS6mUfbeaKSOD1zYLFoZMmq0hGcSBwY5tIBzfXNWEdfGdZJQsnLk2HE67SGZB77t+REy48lu59unH0Gd3o0Dl4OGFD6+DGej8yiheB63rznK19coOYP1NNFCJyMzBDKTXfR9sSdAz6XUBffIQqGpH98rlhyijWm1bgMkqjZGVmspkvGcWxvff4QWSUfIQTGvtmk5VZV1PHkpmrUuGEy+euTocT9jyS82/VXvi+R5jDCZuDjOLkhZtsOBB8LlAhE7qI3AX8FtgfWAD8Wik13qO9AL8C7gAOA7YBLyml7gsw7AvAAMCV0EXkAOAXwG3AAehLPjbAGEBgyUWciTzSwXMjcIh0cEv7hpBRjJsdr1m8kQUTdFLPosnfU12hwwkP++HBXNb/Arr36cZhJxRXOGG+CDxU2dv6qz4nU3Id8B+09zsh8f/HItJdKbXKpds/gUvRXwLzgD3QXwZBUA6c5DKncxPzuAydMboVeAIYopRaGnCcbNdDj2SUSEZxtt9UZJTtm3akQgkXjF+cDic8fG9Ov+ZUevTpSrdeXZtEOKGxPt8yijPBO7W3nVJwiITtod8LDFNKPZd4319E+gF3An+wDy/dgP7AcUqpbw1VswKO+wlwnYh8BPwdmA3cgvb6u6Iv/zjgWWCEUqrW2UxmZJH6Hwsuo3h66gFIPAwCd5lbQWdluszDlxfuSeDG9u5fYMWwuFV1RTXfTVnOgglLmD9+kSmc8OjTj6LHmT/IOpwQCiMr007qTl8eOLR3mLMPAnctC8tDVyqIsc4iMt3wfohSakjyjYi0QnvJT1j6jUEn+jjhcnTqfT8R+RD9If4a+K1SapPfiQF3A23RXng/IJ6wtR34N/CsUmpRAHuuyEJDN5BuY+ngDUHgpvHc2ur/ik0HLwYCj9fHWT5/bXp1whkrqKtJhBOefBg9rziZY87sxoE9DgwlnDBbAjfaylUHD0rgjnPNgcBNy2iFQOpKBNXCd5TLlgyrLXZGSxobLeUbgfPszQG9U9AhwPVoj1qhvxDeF5FeSvlbOUwptQ24QkROBu4BrgVao3d12RPo5MeOHwTW0CMZRcNbRgnwhRTJKKHJKJtWbWPBBJ1W/+3EdDjhQd0P4LyfnUn3Pt046tQj8hZOCMG88EKTUZwkE8cyg+EUo5kI3n7u2UEsRBEKrF814lCWRAxNvDcqpRYDiMiNwCLgFGBKoIGVmg7cIiL/C9yEllxuAm4WkQXAEOBlpVRZELtGZLGWi50Ii4rATW3TRU1HB083KPaszJ1llXw7KR1OuGllIpxwvz344QXH0qNPN7r37srunTuY+jcVGSVoVmYYOrhrmYcXbv614TCZHKCCRe15YQs6O3M/S/k+2L32JNYDdUkyT2AJUAccTEBCT0IptR39cPY/iXXPbweuSpT9VUTeRktGk4LaznpxriYpozSUDm6y6XSuTiTtTrDGuTVnGaWupo6ls1alsjKXz0mEE7ZvTbeeR3Duz87kmDN/YAsnLCQCN9swv3cqK0QZxZHAXepzglg/t9lDKVUjIjOA84G3DVXnAyNcuk0EWojIEUqp5Nruh6N5c2VI8xoHjBORTmhZ53bgRuAmEVmglDo2iL3sJJeG8sIbMCsTMnnhjSOjFGpWpvF1WF64tY1SirVLNqVklGQ4ocSEwx3CCY02ClFG8QondBonqIziN5wwFxnFTSdP1YcZshjomagvPAm8IiJT0WR9Bzrm+xkAEXkcOFUpdW6i/efoFP0XROTXibJ/oz1z4wPYnKGU2ooOkfyniJyDJvYrgtrJIsolvblGoxK4sU1zJnDTWLkSODSkjOLUpnTzzhSBL5ywmNKN5QDse9jenH71KfQ4s1uTCSfMN4GbbORRB/ck+HxEtyTtSaCHopntKTU84QkPQMeSzwcuNuwEtD9whKF9XEQuBZ5ChxVWAp8B9/p9IJrlPMcCY0Vkn6B9s1ht0fDGicSz1cGt7YtdRnH8YnFq7z5+Mcgo1ZU1OpwwIaOs+W49ALvt2Z6jzzgqpYN3OqizqX8hyShNLZwwVxkl7vClYITTvckeYdoCpdQgYJBL3S0OZeuBH4c6CZ8IGBoJBN5TVLKMCfdP4KlxPOozfnk4ers+vfCG1MFNfTITuLk+Wy/cXUZpqNUJVyxYpzd5mLCYpdOX63DCViUcdfLhXP37S+nRpxsHHRNOOKHjfAsgK9PULtXeYc6NJKM4EbjVZrree+zsEXpiUdEju4eiBaqDR6sTFh6BA2xavT0VibJw4hJ2lVYAcNDRB3DuLWfSo09Xjjz1SFq3bV6rE+p6j/YFoINnQ+BhSi9Ov5oiuCMnDb3J75XpMr9IRnEgpgAyyq6yShYmwwknLGHTii0AdNx3D354Xg+69+lG997d6LB30wwndCprMtEobjq5TxnFjbjTXzQhQqyf0wiZkMXiXHaPMcrKzJ7AzfX5I3Bz33AJvK62nqWzViX2ylxkDye8uQ/d+3Rj/6P2M4UTFrqM0hirE+aalRm2jJKJwI3Dx50uQI5Q4X5FFD2yWJxLikcHN7wOUwfX9ZbzLyIZxRhOuGDCEhZN/p6qXdWpcMJL+59Pj96JcMJWLUw2Gn91wvT1KHYd3FQfooxi7B63XgAX+9lAIahY8W1wkU9kIblIYcsoeQonNJd5eOGe4YTG9nayLGQZJRlOmFyhcPuGMgD2ObQzva46me69u3L06UfRdo/2pv5NRUZxJnV3791sw2HORSqjuHnhqTmHrLlEkkswZCG5lCReFjCBO8zDmcDTE2koHbypEHhVZS2Lpy5LhROu/laHE7bv2I6jz+hKjz7d6NEnCie0tW/GBG4aNkQvvVghIr2AnwMnoNdZL0Mvzfu8UmpyNjaziEOXzCRqfF0k4YS63nJORSSjxONxVi5Yp9Pqxy9hyYzl1FXX0aJVCUeefBhX/e4SevTpxsHHHESsJAonLIZwwiA6uGnOGUg8TC+9WD10Efkr8DvsX30nALeKyF+VUn8Maje3KJdsvHBHAje8aXQZxYPATWNl6YWbeMrZOza+zufiVpvXbE954AsnLmXn9l2ADic856be9OjTja6nHUGrtq1N/RsrK9NgKnU9msPiVrno4KZzyaSDm+bkXO/mhZvahJVDKRBqjlKBQESuRe+AtBx4FPgK2IDOVO2LzmS9T0RmK6XeCmI7IKFLBo85XdR0CDzdoNgJfGdZFd9NXpog8SVsXL4Z0OGEx53TPZWV2WGfPUz9m4qMUmwEDt4ySiYCN75uKAIPU0JXyvm8iwD3AJuB05RSWwzly4BlIvIeelmCe4A8ErqgyaWZyihe4YTmuSVLGldGSYYTLpywmPkTlrBs1kpUXNG6XSu69TyCvjeeQY8+3di/6/6pZyIKiWQUQ30kozjUuxB46MlFIqhYdrtkFjiOB16xkHkKSqnNIvIOcENQw1ltQZca2NML90ngrvXZeeGNu7iVO4EbX+dzdcJ1Szenwgm/+2ZpKpzwsOMP5pJ7zqdHn24c7iOcUNflX0Yx/91bvXh7v2wJXNd7tG9AAjfV54HATXazJHBjm6AEHg/RrS5SDb0lsCtDm51Aq6CGA0suSmKRjGLiqcaVUcq27GTBhCUpL3z7+lIA9jmkMz2vPIkefbrxg15H0a5j8HDCSAd3Jq3mpoMHJvHQ+FwcP1NFgO+Bi0TkD06rNopIDLgIWBrUcHY7FhWwjBJmVqZ5bsmSxpFRjKsTLp62IrXJw+qF6wBov0c7jj7jKLr/8gJ69OlG54M7W8Yyk44uKxwZpTGyMjOWNbCMEkZWZkYSzzOBhxuHTrF66K8BjwHvisj/GTbPQESOQq+LfgyQ5ygXEftqixGBW+yGQ+DJdjqccH1qcavF0w3hhCdF4YSe7QucwN3GbqoEbu6bO7Mrwv+CKBA8CfQDLgMuFZE16O3u9gcORP/hfJVoFwi5rbYYhROaO4Yko2xZW5rywBdOWJIKJ+zSbX/OufEMuvfpRrfTjqBVuzam/pGMYjw1n2WRjJIXEg9HRheUlGRu1sSQ2A7vPOA3wG3AYcBBierlwFDgH0qp+qC2s49Dz5cX7qWDg80LL4bVCSvKK/l2cjorc8OyRDjhPh04ru/R9DjzBwUfTmi2YX7vVOZF4Ob2xjLLXBuCwN3qHQjWiMbKyiwUAg/ruWiIm0QXFJRSdcBf0ZtCdyCRKaqUKs/FbnDJxeShh0Dgpj6ZCdxcnxuBG9uHJaP4IfC62nq+n72KBROWMH/8YpbPXkW8Pk6rtq34Qa8jOPun5nDCpiKjNIYO7tTe1K6BZZR86eBNicDDkFvS07I7GMWGBImbiFxEzgT6KqUeCmIr67DFxpFR7PabgoyilGLd95tTHvh3k7+naqcOJzz02IO4+K5z6dGnG0eceCglrVqa+hdiOKGxPpJRnMcuRBnFFk6YA4nbCT1suSXhwDQDQnfB2cCfgPwRuiJBekUXTmjoHERG8Qon3LqLhYkHmcZwwr0P7kTPy0+ie59u/OD0I2nfcTdT/0KSUZoTgUNuWZlBCNw05yYio/gl8BCdc6A4wxZF5EsfzQ5NtP0CWAd8qJR6M1OnLFL/nTx0HMrMpF/sOnhNVS2Lpi1PpdWvWrAWSIQTnn4k3fufT/c+3dj7kL3N/TGTii7LTUbxInCjraYko/glcKPthtLBjcM3BxnFq13YUSlFGuXSB3x/U52d+P8GEdlHKfWUV+PAqf/aQ9dv856VaWpfWDJKPB5n1UIdTjh//BKWTFtGbXUdJS1LOCqxOmH33l055NiDiZWk5+lE4LqucLMyncYJmpVpsuFY51FmJAzroJa5NKqMkoHArf0ykXguBG6zGdALt+rg7uObxwxtYa6kPYoyyiUj74rIA8BDSqkSEdkX+BC4HQiR0AFljUNvJgQOsGVdKfPHLWbhRO2F79yWDifse+MZdO/dja49j6R1u8JYnbDQCNxsz6k9tvbNSQcPTOBeNhuFwMN1pxXOn/NmgtSJK6U2isinwP9m6pTlaot2Qm2svTLzKaNU7qhi4Tffaxll4hI2fL8JgD323p3jzj6ao3t3pXvvbnTcr6O5P2bS0WW56eC2OVvqCklGCVsHN9qOZBQXmyHKKNY+3kvjKsd24aA4NXSfUJg/HtVAm0ydgnvo1gW5PAjc2N5K4GZbyZLGzcqsq61n2ezVzE8k9Hw/a2UqnLBbzyM4+4bT6d6nGwf4WJ1Q17l44U7kljOBp69HsWdlmuqz8MIzEbjJbpYySkMSuH4fzAsPooN7eeHm8wyd0R3/rpoDlFKPAI8Yij4BSjP1yyIOPUYgGcWTwKEhZRRrG6UU65dtSXng305aosMJRTj0uIO46M5z6dGnK4efeDgtWzft1QmdxgkqozQEgWeqL1QCN7bJ17oohSCjeBG419jZIu7wN9EcoZSaCkzN1C57Db2JyijlW3eyYOLSVEz4tnWlAOx90F6c9qMT9eqEZxxlCyd08sL9RKPkI5zQbMP83r2ssGSUQs/KNI3rMD+rjeYioxhJ3OtXQhhQODtOEdyRheSSfuocFoGb+4YjoxjDCRdPX5GIB1/Mqvk6nLBdh7Z0P+MoLrlbrxG+96H5Dye0XgdrXSHp4MZ6Ly/clbCsgxlsF3pWpqlJEyVw+/jOc7dOKBcCD5vQ9cwiQg+CgIlFiYcUWergxtf5XJ1w1bcbWDB+EQsmLGHx1HQ44ZEnHsqVv7mY7n26cuhxh0ThhNb2PgjcVJYDgZvqfRK429iRjOI0vmWcPMgo1vfxkEMWUeL4NxnBHVlILsZNopMvGldG2bKulIUTlqS08B1bdwLQpet+nHXD6fTo041uPY+kdfvgqxNCw8gomT1uc5nzlwcO7R3mXOBZmbo+fBml2LMyG9oLD53AHRB56MGQxZ6i0NgEXrmjim8nf59YYtYcTtjjzG6JzY7t4YQRgRvb54nA3eojAm+SOngQAg87ykVh/gxFyIzgW9AlMrcaMpywvq6e72evTmzyYAgnbNOSbj2P4Kz/14sefbpxQLcDcgsn1IWO88VgK58yStBwQl3v0d6R4D3KjMRkHRTjeRon4FDvQUS6vjhllGy866Yso1hthx25WB+yvWJHFotzSYOsTrhh+Vbmj1+kVyf85nsqd1QhIhxy7IFcdMc5dO/TjSNPOowWrQtjdUKDKfx44c6k7u6FexG4qX1DeOEOBA7eXnihZWVa2zc9GSU7ArePHS6Bh/tgVCDk9dBF5C7gt+jdgRYAv1ZKjffR7yhgJiBKqd1ynMMhftsqpVYGsR3YQ4+nNPRwZZTybbtYOHFJanGrrWu3A9D5oL045dIT6NGnG0f3tocTRjKKsX3TIXDj60wEbppzROC2drqts3372Hkm8FAJ3fx5yxUich3wH+AuYELi/49FpLtSapVHv1bAm8A44KwQprIM/4tzBfpGy26T6ARykVFqqmpZMn0F8xMyyqoFa1FK0bZDG44+/Sguvuvc1OqEzU1GcSbkSEax9ku+aQ5ZmV4E7jW203svEnfSweMetm2DY7mmOUDh/HeZA+4Fhimlnku87y8i/YA7gT949PsbMBf4mnAIfRh2QhdgP+B4YF9gIrA0qOHgqy0iNnKGzAQej8dZ/d1GFoxfxPzxi23hhJff248eZ3azhRNC48ko2RK40zjNRQd3m0eYBG5sE4UT5kbgTraDeuFWAg9zxcWwNPmEl30S8ISlagxwuke/S4BLgROBq8OYi1LqVo/xWgIPAncDru3cEFxysaX2u8edb11fpiUUSzjhAUft2yzDCb0I3NTe0UPPUJb0Vg1jhC2j6Hr3sY2vHQncNq69PhOBQ35IPJJRnG07De7kgVtJPCQeDuKhdxaR6Yb3Q5RSQ4z1QAmw0dJvI3Cek0ER2R94DrhKKbVDxPdcsoZSqhZ4ILGJ9KPAdUH6B38oirh64ZU7q0ybHa9fqsMJOyTDCXt3o3ufrnTcb09Tf4WZVHRZ4WRlOpU1RlamqSwHLzybrEy3sZ24silnZXrZbciszMxz9E/iYejgmbzwsAjcOo36uG8S3aKUOtmPWct7cShL4lVgsFJqst9JhIhJwE+DdgoetijpsMX6unqWzVmT2ORhMctmraS+TocTdj3tCPpcr8MJu/wgczhhwxO4Ph9ju2wJ3NQu1d5hzk1URol0cP8EDlayawIEbh2c7Ag8/NUWQ10+dwtQj9apjdgHu9eexDnAWSLyp9SEICYidcBdll8AYaMt+ldFIAR+KLpuxdaUB/7tpKXpcMJjunDhL/rSIxlO2KYVYCDnJiKjOJN6Bu89Vecw5wKXUaJoFExoNuGEWejg1o+PE4GHnlwUUpSLUqpGRGYA5wNvG6rOB0a4dDvW8v5y4H7gVGBtKBNzxxB0ZE0gBCL01d+t576z/wpApwP35JRLfkj3Pt04+oyj2G2v3QEDMWG/GdHqhBnKPAjcaLsQszJN47rMvxAI3G87/wRunlBjZWU2hA7ul8BViAKM22c4SzwJvCIiU9FRJHcABwDPAIjI48CpSqlz9dhqvrGziJwMxK3l2UJEugH3AD2BPYCtwGRgkFJqdjY2AxF6q7atuPG+a+jeW69OKCI5E7ixLp86eL4J3Fgf6eAO9ZEO3rAySkg6uO18fBJ4GJ66wv73n5M9pYaLSCdgADqxaD5wsSF5Z3/giNAG9ICI3AC8gJmDj0B7/3eJyB1KqReD2g1E6Psc0pmzb+xT9DKKE+EUkoyS93BC05yc6wvZC49klGR7bwIHO4n78cLzReB2o1AX9qbTSg0CBrnU3ZKh7zB0DHlOEJEewPPAcvQ+oeOAcnRI5VfAv4EhIrJQKTUliO3AGnpcmROXCpHAncqKWQfX9eHLKM2XwM0Tau4yil8CD1keSSA8D72A8H/oDNDLlFJLgGTQSIVS6mMRmQUsQS9RcE0Qw8GXz41klEhGSRY3QxklCIFDccsoTp/LMNdyUTirAEWAvsAnSTK3Qim1QUTeQ0fZBEJWG1w0dFamsT4fWZkmGx4etxOBG183JxklMIF72QzohTdmVqZ9jv4J3Ml2U5NR/JJ4WApMfrz+Rsd+uEfWJLEa2Cuo4SwkFwt5Zkngul0ShUngTmWOHrhhwGIlcGt7p3novtl74RGB2wdvbgRuRtHuWLQLaJmhzVG4x8e7IgvJRVyINJJRor0yXWz6IPBs20UyivO4+ZZRXAk85G2MitRDXwcc7FYpImcDP0I/OA2EwKn/yYeihbQ6oa73aN+ABG6qz8ILz0TgJrtBCNzwpiEJXL8vbC+80Be3yieBO0wnUeaTxB0uTlg6ulLhR7kUCCYCPxaRlom1W5I4X0SOR5P5RuChoIaz2LHI7l0XwuqEJhuOHrpTe2ztm5OMkguB22z68K4LhcDtcywsAgc7iReMjJJHAndEAyyI1QgYjl6G9zT0uuygb3kv9PIEo4FfKaXWBzUcXHJxJfHCl1FcScM6GMbztE8mH+GExuGDyChBCNzaPpJRDGPmUUbJdnXCbLzwvOjgPkk8L6HoefyuaCwopb4CuluK+wI7gcVKqV3Z2g4sudSnCNOdwJ3KmpsO7jZ2sengXnabK4HrPt5eeFgE7tivgQjc1UZIzK4oTkJ3gvKxDZ4fBJZcjE+evQjcXBaejBI2gWeqj2SU5Gtn8iwUGaXQdXCwk3hT1MHdvwTyw7xFGocOgIicDtyC3jxjD6AMmAW8oJT6JhubwQhdGSUErxDDDN57qg739o4eeoayxGvfOrhbvQPBGhEtbpW5nbVPQ3vhTTUrU/fzQfQhE3ig/h4VYevpxeqhi8h/gP7YPyYnALeKyNNKqV8FtZuVhp4rgRv7RKsTRgRuqokIvCgIPAxiVwrq6nM2U3AQkbvQqywuBR5D71W6Eb2X6JnoJXr7i8gSpdR/g9jOImwxklEyyigZCNzaz4lwo70yvQkcIhklPU97u3zJKPkkcGcUpeRyN7ABOE0ptd1QvgJYISKjgQXo5X3zR+igibaprU4I3l54oeng1vb59sK9POHsvPDsCNw+dtMjcKeyxiLwQP09KoKSeDxEPb1IJZfD0XuebneqVEqVicg7wO1BDQffJNoWvZL4P9LBmzGBmycUySh2Eo8IPDgUeXvW2tjYCtRmaFObaBcIWWwSbSfxppqV6TYPRxJ3JVnnfpGM4nfswvLCnfijaGSUEHVwVxIPk4BV0Xro7wKXi8gApVSVtVJEWqG3uxsZ1HD2i3MVuw5uqAhDRsmFwG02IwJ3tZcPGaXJErhLRcESuIPpYnwoin7oeRrwpYj8ARinlIqLSAzoAzwOlCbaBULwsMW4U4IQ3mWJ15GMYn8dRaO423Uct5FklEgHT3ZwGdfDVgQb5gKt0FvejQXqRGQr0Ik0J68H5oh56QNRSh3qZTgLySXLtVGcvGyrcUMdZOeFu5K5g81sZZSGJHD9PpgX7sV5YRC419hO70PNyrQOTuPKKH4JXM/TqbAwZBRPHbyxSFw5/6IqAsSBanRUixFrMvTLeLGzllwgHAI31eeBwE12i5TAvce32C4wGcWJR5qaDt4UCVxPJVwvXIW8dK4eL3STjQ6lVN42os4iscj8fz7DCXV985VRsiF6r7lbJxTJKNkRuC6zFzY3GcWLwJ3uWVAoijbKJW/IitAbQkbJRODG4YPIKIVG4F52C5XAIWQZxZHcGsYLL8jVCRtRRsnFAw+DxG1jFqOLnkcE09CV0aNuPB3cZDcDgZuaNBKB6/eZyTkbHRysZOePwL3Gdnof6eDJeToVFoaMkg8dPGsvPAQeVgpqizPKJW8IvsGFkkhGCeiF51dGyUCmkYwSySg+bUEOBJ5hzGwhkYMeCFk8FNX/RwTeWARunlBE4BGB6w4u43rYgsIjcNsQEaEHQvCwRSWFFU5YAATuZTcKJzTatDT3MW6xbfLQ2DJKTjp4Jn7PwxPM4oxazB8Ce+j1Dle40LIyre2d5qH7hu+FN2Y4oX2O/gncyXauXnhTX9wqjKxMLztN3QvPB4Fbx877GEWGwJmi0HQI3G4/fAIP0i4bHTwicI9+IRN4oP4eFc2ZwMOMSlE43+OmDhF5FRiq9N6ioSKr9dAjGcWZwO3jO8/dOqGClVEcyc2bxJ3+/iIZxToNL9Z0Kc6SwCG/Mko+wwqVgpq6ImR0+Anw/0RkMTAUGKaU2hKG4eAPRZOLTWXywoMQuOVN6iFrtDphTgTuZDtsLzyfBO4wnURZ9l54c8jKbGwCj4eYr1+U21toVABHAX8HHk1sajFEKfVFLkazSixyJHBIVTR1GSXSwY32HMbIMKZTWZMgcJcKT2KNZJRQCdw+dt5MNzbeAP4B3JQ4rgF+LCLL0F77C0qpTUGNBn8o6kDg4IPEC4DAvew2pIziReBO7yMZJZJRbOMVkRfuMZO8SjqNDaXUYmAAMEBEzgVuBq5C7zP6kIi8DzynlPrUr83AmaK+vfCIwD3m0YAEbh2ciMCdEBG4N3wReB44vrkEuSSkli8SG0hfgyb3K4GrRGQ52mN/LJOdWPCBdehiXOmjXunPk8JA+CpdH1eaMFIH5vZK6b/R5Gul9Ic3eeBm03Qow+FsNx5Xru2Mh/HcrHO3zN7STpkPzzmaj3gc02G75ga7YJ5f4sOQOlI3yXDELYfpfiSfiRgO65jJcW3vLf9c+xmmk/4cWebscF660H5xrNfPqb/Zhr3QzYbb3PRUVOowd3C4gBls6WsVTx1WWO+ZvbPzeNqu/R7Y2njMS48fNx3OjSxHyFAKamuVr8MvROQuEVkuIlUiMkNE+ni0PVtERovIehGpEJG5IvLzUE7OBUqpnUqpYUqpvsCRwEPAnsAjfvqHJrkUgg7ut521T0PLKIWYlek0brRXpgdRuFQ1hg6ubXs3aEgvPEyZJLDH6QERuQ74D3AXMCHx/8ci0l0ptcqhy+nAPPSDy/XAhcAQEalSSr0e4tSc5rovCQ8d2MNvv8Bhi+lYdEN5gRO4fXznuVsnFOngDUDg4FtGKSYCh+KRUfKpc4e8guO96BDB5xLv+4tIP+BO4A/Wxkqpv1iKBotIX+BqIHRCF5HW6L1EbwYuQH+fbQH+CTzn0TWF4FvQJa6vE4n7InBjB6JoFNc5BvTCo2iU4ASup+LG1K5dGsULb7BoFJ/SSUM8rLTKdBnQWUSmG94PUUoNSb5JbLx8EvCEpd8YtCfuFx3IvLNQIIhILzSJXwt0RN+FL4EhwEilVJ1fW8HDFuMN64V7k6h7O3cvPCJwc5/GIfBA/T0qQvPCC4zAte3cvHDfkSgheuEq5OiXAKn/W5RSJ3vUdwZKgI2W8o3AeX4GEJFLgXOBM/xOygP7isgA4EZ0PLpKzOWv6CzSZdkYzXq1RWhcGSXSwbMjcN3P+uXQDAkcQpdRGpvA9RwaXkYJm8TNtsM3aXkvDmU2iMgZaJnll0qpqSHM41LgksTYY9De+GilVE4rwGex2qJ+7Ubgus5QFZCcG1NGyUUHd7LdGF54k83KdKlo0jJKjgQOhSmj5JPAzeMoqmtDG2sLUA/sZynfB7vXboKI9AY+Ah5USg0OYS4K/ZD1RbQ3vjIEm0BWGrpKTymBpkrg9jk2PQJ3KosIPNnJtUukg/sYK93OP6l6/iLKAiUh5f4rpWpEZAZwPvC2oep8YIRbPxE5E/gQ+LNS6t/hzIargfdVHr4ZA0suxgejEMkoXrYjGcVfRSSjeKOhZZTGJHCb/XA1lyeBV0RkKjARuAM4AHgGQEQeB05VSp2beH82mswHAa+JSNK7r1dKbc52Ekqp0dn2zYTAkot1PfSgXrgX5xW6jOL02c3VC3f6uBaNjNKIBO5lDxpPRmlIAvc7nm5XOCSegvL3q8a3OaWGi0gndLr9/sB84GKD5LE/cIShyy1AO+A3iSOJlcChoU0sRGSxOFdhyyiFroNHBJ65XE+lQGSUiMCDEXiIHrUidA8dpdQgtMftVHeLw/tbnNoWKgJr6EG98EhGiWQUP+WRjOLUKHOTRidwbTxY+8IwXZTIOsolGxklysp0f6/7ZSb6nAgcfHvhxUTgUDxeeL7CCRvLC/ceRlFT0zARNcWCgJJLejGfpiqjRDp4JKM0JQL3O55uVzgEHuSLxwsxiVz0IMhug4tIRmlS4YSB+ntUhOaFNzSBZxhT2y4cEm90AtfGAzbPA/GqzPclghlZEXpE4P4IXPfz4akXO4FD6DJKROAZhm2KBO6AYuZzETkavTBYT/QaLqXAFGCQUurbbGwG3uAiriIZxe297td8ZJQonNCpUeYmfsfT7YpPRvE9HsXroYvIbehomxJL1UnA7SJyt0qvCukbWSzOlR2B6/fK8TU0fjghRDKKn/LmJKMUFYFr4wGbZ0emYZGwUoqa8FL/CwYichowGChDb1zxMbAW6AL0Ax4EBonIbKXUtCC2c1ucKyJw934RgTvXNXUCh0hGsfbLkxctQElxPhT9DXpdmT4WaWUxsFhExgCzEu2uC2I4Zw89LBklMIFbByeSUZwQySjeKCovvIFklIaUQYpUcumDXlnRUSdXSn0nIu+RxTK9OSYWZU/gEHnhfvq69nepaM7hhNDMZJQiJHDTuKpoCb0jsCRDmyXo3YsCIYvEokhGMZdFMkqTl1GaCoFr4wGbN6yMEu6DU5cNsps+tgCdMrTZB9gW1HBOa7lAyDKKI7k1noySE4FDk5ZRoqxM/+Ppds3XC89n9EtDR9Y0EOYCXrsrkaifF9Rw8IeiRayD+yVwPU+nwqZL4F72INLB7e0iAs834gqqizP1fyhwt4h0UUqttVaKyAHomPShQQ0H1tBtRZGMkn1/j4pIRnFHURG4Nh6weVOWUfyjWKNclFLvAu961K8DzsnGdqhx6InJWAss7Z2I1vI+w5iuZVFWZqKTS3ETJXA9h8Jd3KrYCDyXMcPqn7BSrJJL3hDwoagqaB08dAKHJi2jRDp4fggcIhkl7P7ORnM7p0KFiBySbV+VYf/R4FvQEckonv09KiIZxR2RjNL0ZJR8e8+KLK5708AytKKUDWJeldlFuTSQjBIReLKDy7getqAICByirExrvyImcPt4FOt66MPIntA9EfyhaOKmNgcZpZh0cIhkFM9hi4zAcxkzrP65QlDF+lD01nzZDpxYlCSFpi6jNPuszExjRlmZTsYDNm+eOngmx8K3HQL8kosAZBXlYnhtq2ueBO5pPyJwS6PMTfyMlW7XtAkcikNGCYvEzUZz+4JrjghO6Nb3kYximUYko9gbZW4SySj5HTOs/ik7+SBwx3GKk9BF5FjgLuBQ4Hvgn0qp5bnazX21xSyzMqGRvfCQCFxPpUC88CZE4H7H0+2icMLG6p+y00AEbhm10XX8fEBEegJfAS0NxdeLyAlKqdUichiwALhLKTUsiO2sMkWbg4zSHAgcmpmMEhG4fzuNQuBmxONQXV3f2NPIBx5F//X+CE3sVwEvAPcBdyullovIAuBSdESMbwT00FXqwxrJKMlOLsVNNJxQzyGSUYIiIvHwIaJoESusOYWEE4ARSqkPE+9fEZGfYU73nwecFdRw8OVzDYwSrU6Y2RY0Hx3c73i6XeSFN1b/lJ2QCTx0eUTlwWZhoAV6yzkj5gKnGd6vB/bPxrB/qJBJPJJRAo2n7UYEbjHsv22qS0TgYSGfhKso2oeic4BjLGVlQFvD+05AbVDDWWSKOpVlT+CB+ntUFLIX3hxllOYcTpjLmGH1T9kpdC88A4o0Dv0fwEgRuVAp9WmiLPWwQERaABfTEOuhQ0TgEBG4c9tIB2+s/ik7TZzALYMXq4deDrwFfCAiz6MfjB4KICJXAncDBwJ/CGo4px2LonBCw1hRNIr30JGMEmr/lJ0mJqMEQVxBdVVRRrmMRa/looD/AX5B8hElvAPUAY8opV4LajiL1H+Ix+PMnzqCbz5/lfJta+iw54H0Ou//ccwpVxOL6cXACsoLb2gCzzCmtp2bFx4tbtWwXngxZ2UWCoFbIUCLksKcW474M86Lc1WjH5Z+oZRan43h4FvQ1dUx/Jk7WfbtSmpr7gOOY9eOubz/6uMsnPEF194+EEmQekTgRttNT0ZpzgSey5hh9U/ZaUIEHq5EoogXWChlGFBKPZIv255r69qgFPOnjUiQ+XjgauAo4Gpqaybw/bfLmTttpPbijfc1WWCpUEqZjkzloIkjeZjnZjhs03a2BfqPJXlYEVcqddg7ZhgzrkyH15zcJam46XBuZDlckGmsdLt46sgE13vhbNh8+IDfOZv6ZLjufsfLSsLJsX/KjuEzGdpCVyHMy9V2ltc7qG2vwy9E5C4RWS4iVSIyQ0T6ZGh/rIh8LSKVIrJWRB4UkbwsfRsGghE68M3nryc88zaWmjbU1vyBKV8kZJ88ELiJODwI1cue1x+LkcAzkri1yscHrLEI3PNXiYHAM5G4671wNpwTgfv+FREReF7n5mg7h2seaBwVLqGLyHXAf4C/oJN7JgEfi8jBLu07AJ8BG4FTgF8CvwXuzeW8ROQXIrJdRC60lP+viExPHL/IxnZgyaV8+xrgOJfaY3W9hcSdEMkoyTlEMkpQRDq4g7086+CNEW2ilKK6ui5Mk/cCw5RSzyXe9xeRfsCdOEeU3AC0A25WSlUC80XkaOBeEXlSZX/RLwSqgM+TBSLyU+AJoALNy8+IyAal1HtBDAcm9A57HsiuHXPRUosV8+iwZxdnacPj3AuGxJsQgfsdT7eLolEaq3/KThOLRimEcMHWrWIceWjbzA2BcRnqRaQVcBKaNI0YA5zu0q0XMD5B5kl8CjyCDjNc7mtydhwPTFBKGUN47gI2AD2AEnTm6K+AQIQuQT4UIrIZ2AFtDoHuMfODWgUsjEPVSmBbkElEiBChKHGIUmrvbDuLyCdAZ5/N26C93iSGKKWGGGwdgI4gOUspNc5Q/iBwg1Kqm8P4Y4A1SqmfG8oOBlYCpyulvglyPgYbO4HBSqnfJt53ALYCTyul7k2UPQP8SCl1QBDbwcIWc7g5ESJEiBAESql++TBreS8OZZnaO5UHQRVmb/h09PPMrw1lW4C9ghoO/FA0QoQIEZogtqDT6/ezlO+DfujphA0u7fHo4wcrAWN0zZWJ/ydaxtka1HBE6BEiRCh6KKVqgBnA+Zaq89HRLk74BugjIm0s7dcBK3KYzuvAKSLyvogMBW4FvlZKbTG0OQ5YEtRwROgRIkRoLngSuEVEbhORo0XkP8ABwDMAIvK4iHxhaP86OupkmIgcIyJXoTehyCXChcR409ALcP0M2I4OhyQxj73QkszwoIazWpwrQoQIEZoalFLDRaQTMAC91vh84GKl1MpEk/2BIwzty0TkfGAgMB1NvP9EfzHkMo9dInI6cB46LHK80TtXSm3DvDa6bwSKcokQIUKECIWLSHKJECFChAZEPjNFIw89QoQIERoQIjICHap4YDK5KJEp+hLpTNHWwBVBM0UjDz1ChAgRGhZemaIHoTe3WI/OFA2EiNAjRIgQoWGxH4awx0Sm6CnAcKVUqVJqK/A+cHRQwxGhR4gQIULDIsoUjRAhQoQiQd4yRaM49AgRIkRoWLwO/ENE3kcvIXALIWWKRoQeIUKECA2LZ4Br0ZmioFenDSVTNApbjBAhQoQGhoiU4JIpmpPdiNAjRIgQoTgQPRSNECFChCJBROgRIkSIUCSICD1ChAgRigQRoUeIECFCkSAi9AgRIkQoEkSEHiFChAhFgv8PxL2wL6GpLvoAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_classifier(cilantro_X_train, cilantro_y_train, lr, proba=True);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An important hyperparameter: `C` (default is `C=1.0`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr = LogisticRegression(C=0.001)\n",
    "lr.fit(cilantro_X_train, cilantro_y_train);\n",
    "plot_classifier(cilantro_X_train, cilantro_y_train, lr, proba=True);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Smaller `C` leads to less confident predictions (probabilties closer to 0.5).\n",
    "- In general, we say smaller `C` leads to a less complex model (like a shallower decision tree).\n",
    "  - Complex models are really a larger `C` in conjunction with lots of features.\n",
    "  - Here we only have 2 features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr = LogisticRegression(C=1000)\n",
    "lr.fit(cilantro_X_train, cilantro_y_train);\n",
    "plot_classifier(cilantro_X_train, cilantro_y_train, lr, proba=True);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Back to the IMDB dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9833333333333333"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr = LogisticRegression(max_iter=1000)\n",
    "lr.fit(X_train_imdb, y_train_imdb)\n",
    "lr.score(X_train_imdb, y_train_imdb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8256"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.score(X_test_imdb, y_test_imdb) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr = LogisticRegression(max_iter=1000, C=10_000)\n",
    "lr.fit(X_train_imdb, y_train_imdb)\n",
    "lr.score(X_train_imdb, y_train_imdb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8044"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.score(X_test_imdb, y_test_imdb) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above: higher `C` leads to more overfitting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8949333333333334"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr = LogisticRegression(C=0.01)\n",
    "lr.fit(X_train_imdb, y_train_imdb)\n",
    "lr.score(X_train_imdb, y_train_imdb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8416"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.score(X_test_imdb, y_test_imdb) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above: lower `C` leads to less overfitting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZoAAAEQCAYAAACJLbLdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAoxElEQVR4nO3deZyd893/8ddn1iwiG7FFJBGiQVQNVYSgilZpf5aii1KU3i3aWxc3VVvQUtxt75boXYpq71K9qSq9kcUWxFpBkElEEAmyJ7N/fn98rxNnzpyZuTK5rnPmnHk/H488zsy1nc+xnHe+13e5zN0RERFJS0WxCxARkfKmoBERkVQpaEREJFUKGhERSZWCRkREUlVV7AJ6o80228xHjx5d7DJERErKM8888767b567XUGTx+jRo5k9e3axyxARKSlm9ma+7bp1JiIiqSp40JjZSDP7pZk9YWZrzczNbHTMc/uZ2VVm9q6ZrYuusX+e4yrM7DwzW2BmDWb2gpkdnfiHERGRbhWjRTMOOA5YBjyygef+N3AacCFwBPAu8ICZfTznuEuBi4BfAYcDs4A7zOyzPa5aRER6pBh9NDPdfQsAMzsV+Eyck8xsN+BE4BR3vynaNgOYA1wCHBltGwGcC1zp7ldHp08zs3HAlcB9CX4WERHpRsFbNO7e1sNTjwSagf/JulYL8CfgUDOrjTYfCtQAt+Wcfxuwq5mN6eH7i4hID5TSYICdgfnuvjZn+xxCsIzLOq4ReCPPcQATUqtQREQ6KKWgGUbo18n1Ydb+zOty77gsde5xIiJSAKU0j8aAfM80sB4e136n2enA6QCjRo3qSX0iIr2bOzSugoblsG4ZrFse/Zz1evCFYF1+XW6wUgqaD4F8CTA0a3/mdaiZWU6rJve4dtx9KjAVoK6uTg/pEZHeyR2a1oSgyA2JfNvahcoK8NbOr11RBZP+HWo3SbTkUgqaOcAXzWxATj/NBKCJj/pk5gC1wPa076fJ9M28nHahIiJdcofmdfHDIndbW0vn17ZK6DcY+g+F/kOg3xAYOjq8Zm9b/5q1rWZg4q0ZKK2guQe4GDgW+D2AmVUBXwL+6e6N0XH3E4Lny9HxGV8BXnL3+QWrWETKW3NDF7ehlnUdFq1NXVzYOobFkG3jhUXtoFTCYmMUJWjM7Jjoxz2i18PNbCmw1N1nmNl2wDzgEne/BMDdnzez/wGuM7NqYD5wJjCGECpExy0xs2uB88xsFfAsIYwOAo4qwMcTkVLS0pi/ryJOy6Kloetr1w4OAZAJgU23ihkWm0JFKY3V6lqxWjR35Pz+6+h1BjCZ0HFfScdRcScDU4DLgCHAC8Bh7v5sznHnA6uBs4EtgbnAce7+t2TKF5FepbW552HRnDtjIkfNoCgEBocQ2GyHeGHRbzBUVCb/WUuQdRwFLHV1da7Vm0UKrLUldFavD4FlXYTF8vbbmlZ3fe3qgZ0Ew5D2IZEvLCpLqYehuMzsGXevy92uf4Iikpy21pywWB6jZREd37iy62tXD2gfEkNGwZYT47UsqmpS+LASl4JGRNprawtf+j25DdWwkvzT2CKVte1DYNNtYItd8rQyhnbcVlXb2VWll1PQiJSjdhPzlm9gWKyArpYkrKxpHwKbbAGb7xQvLKr7p/FppZdT0Ij0VpmJeT0Ji3XLu5+Ylx0CA4bD8HExw2JArxs+K72bgkYkTZmJeT0Ni7bmzq9tFR1DYP3EvCFdh0XNJgoLKRgFjUgc6yfmLd/wsGhtzH9N4KOJeUM+CoHBI2OGxaCymmsh5UtBI31HS1PPw6JlXdfXrh380TyL/kNhRG6fRScjo8psYp5IPgoaKS2tLT0Pi+Y1XV+7ZlD7MNgst8+ik7DQxDyRLilopPAycy2yw6DTsFjRflvTqq6vXT2wfQgMGxPjNtRQTcwTSZH+z5KNs3oprFy0YS2L7ibmVfVvHwJDtoV+u8ZrWWhinkivo6CRnln5Dky/Ep67Lf8w2sra9iGw6dYwYkL3YaGJeSJlR0EjG2bth/DYdfDkDeEW2J6nwtjJHcNCE/NEJKKgkXia1sCT18Oj/xlufe12PEw+D4ZuV+zKRKSXU9BI11qb4dlbYMZPYfV7sOPhcPCPYYudi12ZiJQIBY3k19YGc+6Chy+DZfNh1KfguFtg1N7FrkxESoyCRtpzh3kPwYMXw+IXYcTOcOKfYYfPaMkSEekRBY185K2n4aGLYcEjMGQ7+H83wi7HaOa6iGwUBY3Aklfh4Uvh1Xth4OZw+FWwx9c1J0VEEqGg6cuWvxXmwrxwe5hRf+AFsPeZULtJsSsTkTKioOmL1nwAj/wcnr4RMNj7W7Df92Dg8GJXJiJlSEHTlzSuhlm/hsd+ERaY/PiJcMCPwhIvIiIpUdD0BS2N8MzNMPMqWLMUdjoCDr4QNh9f7MpEpA9Q0JSztlb41x0wbQosXwijJ8EJf4KRdcWuTET6EAVNOXKH1x6Ahy6BJXNgy4nwletg+4M0F0ZECk5BU27efAIevAjemgXDxsIxv4MJX9RcGBEpGgVNuVj8UpgL89r9sMmWcMS1sPtXobK62JWJSB+noCl1yxbAtMvhxT+H588f/BP45BlQM6DYlYmIAAqa0rV6Ccy8Gmb/Ljyvft+zw58Bw4pdmYhIOwqaUtOwEh7/JTzxX9DSAJ/4Ghzwg/AESxGRXkhBUyqaG2D2f4dWzLoPYecvhiVjNhtX7MpERLqkoOntWlvgxT/BtCtg5aIwRPngC2Hr3YtdmYhILAqa3sodXv17mAvz/lzY+hPwhV/D2AOKXZmIyAZR0PRG8x8Jc2Heng3Dd4DjboWPfV6TLUWkJCloepN3XwhPtpz3EGy6DRz5S9jtRKjUvyYRKV36BusNPpgX1iN76S/Qfyh85jLY81So7l/sykRENlrsoDGzgcA3gP2B4cDp7v66mR0PPO/ur6ZUY/lavQSmXwHP3gKVNTDpXNj3LOg3uNiViYgkJlbQmNm2wHRgJPAqsAswKNp9IPBp4NQU6itvd54CC5+APU6G/b8Pg7YodkUiIomLu9Liz4FGYAdgDyC7V3oGoZUjG6JhJbz5OOxzFnzuaoWMiJStuLfODiHcKltoZpU5+94Gtkm2rD7gzcfAW2H7A4tdiYhIquK2aGqAVZ3sGww0J1NOH1I/A6r6w8i9il2JiEiq4gbNi8DRnew7HHgmmXL6kPrpMGpvqO5X7EpERFIV99bZVcCdFiYM3h5tm2BmRxFGoh2ZQm3la9V7sPQV2O34YlciIpK6WC0ad78L+BZwLPBgtPkW4Bzg2+5+f9w3NLNtzexOM1thZivN7C4zGxXz3DHRucvNbI2ZTTOzujzHLTAzz/PnC3HrTNX8GeFVy8mISB8Qd3jzYOAm4FbgU8AI4APgcXfvrO8m33UGAA8TRrCdBDhwGTDNzCa6+5ouzh0OPEroK/omsBb4XnTuXu7+Ss4pDwAX5WybG7fWVNXPCBMzt5xY7EpERFLXbdCYWRUhVL7o7n/joxZNT5wGjAXGu/sb0fVfBF4nhMc1XZx7JrAFcEDWuQ8D9cDFwHE5x7/v7rM2otZ0uIf+mdGTwgPLRETKXLe3zty9BXgPaE3g/Y4EZmWCIrr+fOAx4Khuzt0beD3n3DXAI8ARUSD2fh/Wh+X+x04udiUiIgURd9TZbSQz839n4KU82+cAE7o5txVoyrO9EegPbJ+z/fNmttbMGs1sVq/pn6mfFl4VNCLSR8RtBSwATjSzp4G7gXcJ/SvrufvvYlxnGLAsz/YPgaHdnDsXOMTMhrv7BwBmVgFkJqIMyzr2b8DTwHzC7bZvA381s6+6+235Lm5mpwOnA4waFWtsQs/Uz4DB28Kwsem9h4hILxI3aP4ret2GsARNLgfiBE3m2FxxHrRyPXAWcIuZnUUYDHA+MCba37b+Ddy/0+7iZn8FZgFXEFpnHYtynwpMBairq8tX48Zra4X5M2GnI/RsGRHpM+IGzZjuD4llGe1bHhlDyd/SWc/d683sy4TQy/TTPAtcC5xLaGV1dm6rmd0B/NTMtnL3To9N1bsvQMNy3TYTkT4lVtC4+5sJvd8cQj9NrgnAyzHq+IuZ/S+wI9Dk7vPM7DfAW+6+sJvTM02IdForcWj+jIj0QRs0UsvMdgEOILRKPgBmunu+zv3O3ANcbWZj3b0+uuZoYF/gR3Eu4O6twCvRuVsDXyKsXNBV3VWEyaYL3X3xBtSbrPrpMGICbDKiaCWIiBRa3AmbVcDNwAm0709xM7sd+HoUAN25kdAxf7eZXUBoXVwKvAXckPV+2wHzgEvc/ZJoWzXwM8JjCVYSWkbnEVpJP8869wTCUOn7outuAfwboW/phDifNxXNDbBwFtSdUrQSRESKIW6L5ieECZEXEjrTFwNbAl+J9tVHr11y9zVmdhChX+VWQmg9BJzj7quzDjWgkvbDr53wPJwTgSHAIsIAhMvdPXvY83zCygVXEVpeawkj0A5z9wdift7kvfUktDTAGN02E5G+JW7QfAW41N2nZG17E5gSPZ/mZGIEDUDUl9LZStCZYxaQMxItmjh6RIzrzwIOilNLQc2fAVYJo/ctdiUiIgUVd8Lm1sATnex7PNovXamfDiProHZQt4eKiJSTuEHzDqHDPp99ov3SmXXL4Z3nNKxZRPqkuLfO/gCcb2Zt0c/vEvpojidMmvxpOuWViQWPgrepf0ZE+qS4QXMRYdXli2m/9L4Bf4y2S2fmz4DqATByz2JXIiJScHEnbLYQ1jqbAuxPGM31ITDD3budaNnn1U+H7faFqppiVyIiUnAbNGHT3ecQ5q1IXCvfgfdfg098rdiViIgURazBAGZ2spld1Mm+i8zspESrKif1mWVnJhe1DBGRYok76uxswpIz+SwBzkmkmnJUPx0GDIcR+ZZ4ExEpf3GDZhyd3zJ7hY4PHRMIj22ePyOMNquI+49aRKS8xP32awE262Tf5gnVUn7efw1WvavVmkWkT4sbNE8BZ3Sy7wzCWmKSS/0zIiKxR51NAR40syeB3wJvE562eSrwCeCQdMorcfXTYch2MHR0sSsRESmauPNoZpjZMcB1ZC3nDywAjnb36YlXVupaW8KKADt/odiViIgUVex5NO5+N+E5MuOB4cD77v5aapWVunefh8YVum0mIn3eBk3YBHD3uZmfzWy4u3c27Llvq58eXsfsX9QyRESKLe6EzdPM7PtZv+9qZouAJWY228y2TK3CUlU/HbbcFQZ2NlhPRKRviDvq7DvAuqzfrwGWEyZqDgYuSbSqUte0NjxRU6s1i4jEvnU2CngVwMwGAwcAX3D3+8zsA+CKlOorTW/NgtYmGHtgsSsRESm6uC2aSqAt+nk/wIHp0e9vASOSLavELXomvG73qeLWISLSC8QNmteBz0U/Hw887u5ro9+3JjwyQDIalkP1QKgZWOxKRESKLu6ts6uBW6NVmocCx2btOxB4MenCSlrTaoWMiEgk7oTN281sIfBJ4Gl3n5m1+z3gnjSKK1lNa6B2k2JXISLSK2zIhM1HgUfzbP9JohWVg6Y1atGIiES0dn0aGldBjVo0IiKgoElH0xoFjYhIREGTBg0GEBFZT0GTBg0GEBFZT0GThsbVunUmIhKJu6jmY2b2VTOrTbugkuce3TpT0IiIQPwWTTPwe+AdM7vGzHZKsabS1tII3qo+GhGRSKygcffJwMcIYfM1YI6ZTTezL5lZdYr1lZ6m1eFVLRoREWAD+mjcfa67fw/YBvg6YaHN24FFZnalmY1Np8QSkwkaDQYQEQF6MBjA3Rvd/VbgbOARYHPgB8BrZnZHn38IWmOmRaNbZyIisIFBY2b9zewUM3sKeJoQMmcTVnA+E9gH+EPiVZaSpjXhVbfORESAmGudmdmuwDeBLwMDgbuBH7r7tKzDbjSzxcAdiVdZSppWhVcFjYgIEH9RzReAd4DrgKnu/m4nx70BPJFAXaVrfYtGt85ERCB+0BwL/K+7t3Z1kLu/Qng+Td/VqMEAIiLZ4vbR3AP0y7fDzAZqiHMW9dGIiLQTt0XzW6AaODHPvhuAJuCUpIoqaZpHIyLSTtwWzYGEAQD53AMcnEw5ZaBpNVglVGm1HhERiB80I4AlnexbCmyRTDllILNys1mxKxER6RXiBs0SYNdO9u0KfBD3Dc1sWzO708xWmNlKM7vLzEbFPHdMdO5yM1tjZtPMrC7PcRVmdp6ZLTCzBjN7wcyOjlvjRtHKzSIi7cQNmnuBH5vZxOyN0fya84G/xbmImQ0AHgZ2Ak4CvgrsAEwzsy7HA5vZcOBRYBfCnJ7jo13TzOxjOYdfClwE/Ao4HJgF3GFmn41T50bRQ89ERNqJOxjgQuAQ4BkzexpYRFjzbC9gPnBBzOucBowFxrv7GwBm9iLwOiE8runi3DMJt+gOyDr3YaAeuBg4Lto2AjgXuNLdr47OnWZm44Argfti1tozekSAiEg7cVdvfh/YE7gCMODj0esUYM9ofxxHArMyQRFdez7wGHBUN+fuDbyec+4awnprR5hZJjQPBWqA23LOvw3Y1czGxKy1Z5rWqEUjIpIlbosGd19OaNlcuBHvtzP5R6/NIUwK7UorYRh1rkagP7A9MDd6j0bCKgW57wEwgdAKS0fTahi8bWqXFxEpNYV+lPMwYFme7R8CQ7s5dy6wQ9RXA4ROf8Ltu8y1M6/L3d3zvEf2ce2Y2elmNtvMZi9durSbUrrQqD4aEZFssVs0ZrYL8A1gPB1XCXB3jzuXJjcAINyG6871wFnALWZ2FrCWMBAhcyusLetaG/we7j4VmApQV1eX7/x4mtaoj0ZEJEusFo2ZfRKYTRjBdSih9TEWmAyMI15QQGjN5GtRDCV/S2c9d68nrB69B+G22DvAp4Bro0MyC31+CAw16zCRZWjW/vRo1JmISDtxb51dDtxF6P8w4BvuPhr4NOFJm5fFvM6c6Bq5JgAvd3eyu/+FMNptAjDO3fcANgHecveFWe9RS+izyX0P4rxPj7W1QvNatWhERLLEDZqJhFFbmVtKlQDu/jAhZK6IeZ17gL2zH/tsZqOBfaN93XL3Vnd/xd3nmdnWwJeA32Qdcj9h0MCXc079CvBSNMotHZkFNbVys4jIenH7aKqBNe7eZmYfAltl7ZtLmEQZx43At4G7zewCQnBdCrxFWJwTADPbDpgHXOLul0TbqoGfATOAlYSW0XmEFszPM+e6+xIzuxY4z8xWAc8Swugguh9CvXH0LBoRkQ7iBs08wi0rgBeBU8zs3uj3k4HFcS7i7mvM7CBCv8qthNtwDwHnuPvqrEON0GrKbnE5YRWBE4EhhEmjvwMud/fcYc/nA6sJj5nekhCGx7l7rBUMemx90AxK9W1EREpJ3KC5l9Dxfzuhv+bvhFZFK6GP5Ky4bxj1pXS57pi7LyBngIG7twBHxHyPVsItvbh9R8lY/xhntWhERDJiBY27/yTr5wfNbG9CWAwA7nf3f6ZUX2nRrTMRkQ66DZqob+SzwIuZjnR3fw54LuXaSo8e4ywi0kG3o87cvRn4MzA69WpKnZ6uKSLSQdzhzfWEh59JVxQ0IiIdxA2anwHnm9nmaRZT8prWhteaAcWtQ0SkF4k76uwgwtIx881sFmG5l+z1wNzdT0q6uJLT1hJeK2IvISciUvbifiPuBzQDSwlLu+Qu79LzRSjLiUfrelplcesQEelF4g5vTvdhYeXCW8OrFfrpCyIivZe+EZO0vkWjf6wiIhmxWjRmNqq7Y7JWT+67Ms9aq9CtMxGRjLh9NAvovh9G365tunUmIpIrbtCcQsegGQ58jvAAtEuTLKpkrb91Fvc5cCIi5S/uYICbO9l1jZndSggb8TaNOBMRyZHEPZ7bCC0e8VbdNhMRyZHEt+IIoF8C1yl93qagERHJEXfU2f55NtcQnqx5HvBIkkWVLG/TiDMRkRxxBwNMp+NggEyP9wzgzKQKKmltatGIiOSKGzQH5tnWALzp7rEe49wn6NaZiEgHcUedzUi7kLKgoBER6SDWt6KZ7W1mx3Wy71gz+2SyZZUoBY2ISAdxvxWvAHbuZN/Hov2i4c0iIh3E/VbcDZjVyb6ngInJlFPiNOpMRKSDuEHTr4tjK4GByZRT4nTrTESkg7jfiq8AR3ay70hgbjLllDgNbxYR6SDu8ObrgRvMbCVwI7AI2AY4HfgG8K10yisxWutMRKSDuMObbzSz8cB3ge9l7wKudfepaRRXcrxNKzeLiOSI26LB3c81s98AnyY8IuB94EF3r0+ruJKjUWciIh3EDhoAd58HzEupltKnUWciIh3EnbB5spld1Mm+i8zspESrKlUadSYi0kHcb8WzgQ862bcEOCeRakpdm26diYjkivutOA6Y08m+V4DtkymnxGnUmYhIB3GDpgXYrJN9mydUS+lzV4tGRCRH3G/Fp4AzOtl3BvB0MuWUOG/V8GYRkRxxR51NAR40syeB3wJvEyZsngp8AjgknfJKjEadiYh0EPt5NGZ2DHAdcEPWrgXA0e4+PfHKSpFGnYmIdLAhEzbvBu6OVggYDrzv7q+lVlkp0qgzEZEONmjCJoC7awHNzmjUmYhIBxsUNGa2GzCe8NiAdtz9lqSKKlm6dSYi0kGsoDGzIcDfgb0zm6JXzzpMQaOgERHpIO634uWEfpn9CSHzReAg4A9APbBXKtWVGm+DCgWNiEi2uN+KhxLCJvM450XuPt3dvwY8SFiiRtSiERHpIO634lZAvbu3Ag3AoKx9dwGfi/uGZratmd1pZivMbKWZ3WVmo2KeO8rMfm9mC81srZm9ZmaXmdnAnOMWmJnn+fOFuHX2iEadiYh0EHcwwGJgSPTzm8CngOnR7+PivpmZDQAeBhqBkwh9PJcB08xsoruv6eLcgYTWUzXwY2AhsCdwMbAD8KWcUx4ALsrZlu6IOY06ExHpIG7QPEoIl3uBW4GfmNlowhpoJwH3xLzOacBYYLy7vwFgZi8CrwPfBK7p4tx9CYFyqLv/M9o2zcyGAeea2QB3X5t1/PvuPqvDVdKkW2ciIh3EDZqLga2jn68iDAz4EjCAEDLfiXmdI4FZmZABcPf5ZvYYcBRdB01N9LoyZ/tywi3A4i8ypidsioh0EOtb0d3nufsj0c/N7v7v7j7S3Ye5+4nu3tmzanLtDLyUZ/scYEI35z5IaPn81MwmmNkmZnYQYSDC9Xluu30+6sdpNLNZqffPQFi9WWudiYi0U+i/fg8DluXZ/iEwtKsT3b0B2I9Q8xxgFfAQ4Xbet3MO/xuhlXUo8GXCAIa/mtlXOru+mZ1uZrPNbPbSpUvjfZoORbZp9WYRkRwbvARNAjzPtm6/nc2sH/A/wAjgq4TBAHsBFxL6is5c/wbu38k596+EodlXALflLcp9KjAVoK6uLl+N3dOoMxGRDgodNMsIrZpcQ8nf0sn2DWAyMM7d50XbZprZCmCqmV3v7i/kO9HdW83sDsJtt63c/d2eld+Nwy6H2k1TubSISKkqdNDMIfTT5JoAvNzNubsCy7JCJuOp6PVjQN6gieRbNidZ4z4NwAerG/nny+9RU1nBwNpK+tdUMbCmkv41lQysqWJATSUDaqvoX11JZYVutYlIeSt00NwDXG1mY929HiAaJr0v8KNuzl0MDDWzcdmj1oBPRq9vd3aimVUBxwIL3X1xT4uP64aZ9UydWR/r2J22HMQ39hvDUR/fhpoq3XYTkfJj7un9Bb/Dm4VJly8A64ALCK2LSwkrDUx099XRcdsB84BL3P2SaNto4EVC4Ewh9NHUESZvvgbs5e5tZnYCYaj0fcBbwBbAvxEGEpzg7n/qrs66ujqfPXt2jz/nsdc/TlNLG784YXfWNLayrrmFNY2trG1qZW1Ty/rX1Y2t/HPOYl5dvIotNq3l6/uM4cRPjmJw/+oev7eISLGY2TPuXpe7vaAtGndfEw1JvpYw8dMII8fOyYRMxIBKskbFufsCM9ubMNv/MmAzQpBMBaa4e1t06HzCgIGrCP1Ba4GngcPc/YH0Pl3Q3NrGv95ewYl7bcd2wwd2e/x3P70DM19/nxtn1vPT+1/lVw+/zvF7jeLkfUczcuiAtMsVEUldwUeduftC4OhujllAnpFo7v4ycFw3584irCxdFHMXr6KhuY2PjxoS63gz44AdN+eAHTdnzjsruHFmPTc/voCbH1/AERO34rRJY9llm8HpFi0ikqJiDG8uay8sWg7A7tsO2eBzd956MNcdvzvfP2wnbnp0Pn98aiF3P/8O+44bzmmTxnLAjptjmqcjIiVGvc8Je37hcoYNrGHk0P49vsY2Q/pzwRETePy8gznv8J14Y8lqvn7T0xx23SPc+cwimlraur+IiEgvoaBJ2PNvLefj2w5JpOUxuH813zxgex75wUH8/NjdMINz73iBST97mOtnzGPFuuYEKhYRSZeCJkGrGpp5Y+lqdhs5JNHr1lRVcPQeI/nH2ZP4/Sl7scOIQVz5j1fZ98qHuezel3l7+bpE309EJEnqo0nQvxatwJ3YAwE2VPbAgZfeXsFvH6nnpscXcNPjC/j8xK04VQMHRKQXUtAkaO57qwDYeev0l6HZZZuOAwf+Nxo4cPr+27P/Dptp4ICI9Aq6dZagxqiTfmBN4fI7e+DAj6KBAyf97ikO/89H+IsGDohIL6CgSVBrW1hloaqy8C2Jwf2rOSMaOHD1sbvhDv+eNXBgZYMGDohIcShoEtTcGloPlUW8ZVVTVcExe4zk/nMmcfPJezJuxCZc+Y9X2eeKMHDgHQ0cEJECUx9NglrbnAqDil6wIrOZMXn8CCaPH8FLb6/gxmjgwM2PL+Dzu23NqZPGsPPWGjggIulT0CSopc2pquh9jcRdthnMfx6/O98/dDw3PbaAPz21kL8+9zb7jduM0/cfyyQNHBCRFPW+b8US1trmRemfiWvk0AH8OBo48MPDduK191bxtWjgwF3PauCAiKRDQZOg5ta2kniQ2eD+1Zw5eXse/eFBXHXMRNrc+d6fX2D/n03jBg0cEJGEKWgS1NrmVJVA0GTUVFVwbN22PHDO/tx88p6M3XwgV0QDB6b8XQMHRCQZ6qNJUEubU9kL+2i6kztwYOrMen732AJueiwMHDht0lgmFGASqoiUJwVNglpbS6tFk88u2wzmFyfszg8Oaz9wYNIOm3HaJA0cEJENV3p//e7FmtvaevVggA2xfuDAj8LAgbmL2w8cyMwZEhHpjoImQaXWRxPH4AFh4MAjPzyww8CBqTPnsUoDB0SkG7p1lqDQR1NeQZNRW1XJsXXbcsweI5n+2lJunFnP5fe9yi8eeoPthg+gX3Ul/aor6FdVSe3610pqqyrW76utio7pZHv714+uU11pul0nUsIUNAkKfTTl3Ug0Mw4cP4IDx4/gX4tWcPtTb7JkZSMNLa00NrexYl0zDc1tNLa00tDcRkNzK40tbRs1R6fCyBtS7YKsqoLa6LXTIKuubL8vc05OQGaO6Q0rPIiUAwVNglrKqI8mjl1HDuaKkRNjHdvW5jS2dAyghubW9j+3tNGY9Zp9TL4Aa2huZcW6ZpZk/Z59TrTOaY/UVFZQG6Mllgmp2qrKzo/pohVXW50JPrXepDwpaBLUUoZ9NEmpqDD611TSv6ayYO/p7rS0eachFQIttMQaon2ZkGsfWDmhV6DWW25IZVpkcVtvH23vOiD7VVWq9SapUtAkqLWM+2hKkZlRXWlUV1YwqF/h3jeJ1lu7kMu6Tqqtt6xbid2FVG2eW43r++nUepMcCpoEtfSBPhrpXsm33jLHZW1f2ZB8682MKKg6b72170/LH2zZARanFafWW+EpaBLU0tamoJGiKMfW28p1ze32J9V6q6609aMiN7b11lUfXHb41VRW9OnWm4ImQS1tTr/qvvsfk/Q95dp6yz43ydbbR4HVXX9anpGQHQaedB+QveVWvoImQeU4YVOktynH1tuqhhaWrmrMCsrkW2/rA6ybW4znfXYnaquS/YuDgiZBLa2luaimiHSvGK03CI8fKVTrrbG5lf/47McS/wwKmgSFPhq1aEQkOdWVFQVvvSVNQZOgSTtszlaDS/i/BhGRFChoEvTjIyYUuwQRkV5HHQoiIpIqBY2IiKRKQSMiIqlS0IiISKoUNCIikioFjYiIpEpBIyIiqVLQiIhIqsx9I1ZsK1NmthR4s4enbwa8n2A5pUCfufz1tc8L+sw9sZ27b567UUGTMDOb7e51xa6jkPSZy19f+7ygz5wk3ToTEZFUKWhERCRVCprkTS12AUWgz1z++trnBX3mxKiPRkREUqUWjYiIpEpBIyIiqVLQJMDMtjWzO81shZmtNLO7zGxUsetKi5mNNLNfmtkTZrbWzNzMRhe7rjSZ2TFm9hcze9PM1pnZXDO7wswGFbu2tJjZoWb2sJktNrNGM1tkZn82sz7zhD8zuz/67/uyYteSBjObHH2+3D/Lk3wfPWFzI5nZAOBhoBE4CXDgMmCamU109zXFrC8l44DjgGeAR4DPFLecgjgXWAj8B7AI2B24CDjQzPZx97Yi1paWYYR/x78GlgKjgB8Bs8xsV3fv6aTmkmBmJwC7FbuOAjkLeDrr95YkL66g2XinAWOB8e7+BoCZvQi8DnwTuKaItaVlprtvAWBmp9I3gubz7r406/cZZvYh8HtgMuEvG2XF3f8I/DF7m5k9BbwKHAP8vBh1FYKZDQGuBb4L3F7cagriFXefldbFdets4x0JzMqEDIC7zwceA44qWlUpKtO/vXcpJ2QyMn8D3KaQtRTZB9Frc1GrSN/PgDlR2MpGUtBsvJ2Bl/JsnwP0mXvZfdQB0esrRa0iZWZWaWY1ZrYDcAOwGPhTkctKjZntB3wN+FaxaymgP5hZq5l9YGa3J93HrFtnG28YsCzP9g+BoQWuRQrEzLYBLgEedPfZxa4nZU8Ce0Q/vwEc5O5LilhPasysmhCmV7v73GLXUwArCLdAZwArCX2P/wE8YWa7J/XvWUGTjHyzXq3gVUhBmNkmwN2EDtOTi1xOIXwV2JTQF3ku8H9mtp+7LyhqVen4IdAfmFLsQgrB3Z8DnsvaNMPMZgJPEQYIXJDE+yhoNt4yQqsm11Dyt3SkhJlZP+AewpfuAe6+qMglpc7dM7cGnzSzfwALCKPPzihaUSmIbhedD5wK1JpZbdbu2miAwCp3by1GfYXi7s+a2WvAnkldU300G28OoZ8m1wTg5QLXIimKbqv8BdgL+Ky7/6vIJRWcuy8n3D4bV+RS0jAW6AfcRvhLYuYPhJbcMmDX4pRWcEb+OzU9oqDZePcAe5vZ2MyGaPLivtE+KQNmVgH8ATgYOCrNoaC9mZltAewEzCt2LSl4Hjgwzx8I4XMgIWTLmpnVATsS+uaSuaYW1dw4ZjYQeAFYR7if6cClwCBgoruvLmJ5qTGzY6IfDybcQvkWYVLfUnefUbTCUmJmvyF8zinAvTm7F5XjLTQz+yvwLPAioaN4R8K8ki2Bvdz9tSKWVzBm5sAUd0+kv6I3MbM/APMJ/56XEwYDnAesBT7h7ok8YVRBk4Do3u61wCGEJudDwDll2lkKrP+fL58Z7j65kLUUgpktALbrZPfF7n5R4aopDDP7IWEFiO2BGuAtYDpwRTn/t52rzIPmPOAEwn/bAwhD1/8B/MTd303sfRQ0IiKSJvXRiIhIqhQ0IiKSKgWNiIikSkEjIiKpUtCIiEiqFDQiIpIqBY1ICTCzT0WPUX7HzJqi5dz/z8xOMrPKYtcn0hUFjUgvZ2bnEB6kN4ywuvCngVOA14DfAEcUrTiRGDRhU6QXM7P9CbPxf+XuZ+XZvz0w0N1fLHRtInEpaER6MTO7j7Ba9Eh3byh2PSI9oVtnIr1U1PcyGfinQkZKmYJGpPfajPC0xzeLXYjIxlDQiIhIqhQ0Ir3XB4TnHHX2eAKRkqCgEeml3L2FMOLskJzn14uUFAWNSO92JTAcuCrfTjMbY2YTC1uSyIbR8GaRXi6asHkN4cmtNwMLgaGEx2ifCpzo7ncXqz6R7ihoREqAme0DfBfYjzAabRUwG7gFuN3d24pYnkiXFDQiIpIq9dGIiEiqFDQiIpIqBY2IiKRKQSMiIqlS0IiISKoUNCIikioFjYiIpEpBIyIiqfr/7pDX4kW3u5kAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "C_vals = 5.0**np.arange(-4,2)\n",
    "scores = []\n",
    "scores_train = []\n",
    "for C in C_vals:\n",
    "    lr = LogisticRegression(max_iter=1000, C=C)\n",
    "    lr.fit(X_train_imdb, y_train_imdb)\n",
    "    score_train = lr.score(X_train_imdb, y_train_imdb)\n",
    "    score = lr.score(X_test_imdb, y_test_imdb)\n",
    "    scores.append(score)\n",
    "    scores_train.append(score_train)\n",
    "plt.plot(C_vals, scores);\n",
    "plt.plot(C_vals, scores_train);\n",
    "plt.xlabel(\"C\");\n",
    "plt.ylabel(\"accuracy score\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next class & test set\n",
    "\n",
    "- Note I previously said not to do hyperparameter tuning on the test set.\n",
    "- We didn't use cross-validation today because we need one more element, Pipelines.\n",
    "- Next class we'll talk about Pipelines and hyperparameter tuning."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Q&A\n",
    "\n",
    "(Pause for Q&A) \n",
    "\n",
    "<br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "- IMDB movie review data: predict positive or negative review sentiment\n",
    "- How to turn text into numeric features? Word counts (or presence/absence of a word).\n",
    "  - This we create one column per word in our \"vocabulary\"\n",
    "  - `max_features` hyperparameter controls how many words we include (takes the most frequent ones up to `max_features`)\n",
    "- Make sure to split data _before_ preprocessing.\n",
    "- Make sure not to call `fit` on validation/test data (this goes for both classifiers and transformers).\n",
    "- `predict_proba`: useful confidence scores, but we won't interpret them as actual probabilities.\n",
    "- Logistic regression coefficients: \n",
    "  - The sign matters: positive means increasing that feature gives a higher probability score for the \"positive class\" (arbitrarily defined for each problem)\n",
    "  - The magnitude matters: larger coefficients means the feature contributes more toward the scores\n",
    "- With `CountVectorizer` each feature is a word, thus each logistic regression coefficient corresponds to a word.\n",
    "  - So, by looking at the coefficients we can get a sense of which words are the \"most positive\" and \"most negative\" (more on this later in the course!)\n",
    "- Key hyperparameter of `LogisticRegression` is `C`; larger `C` leads to more complexity (higher training score, higher train-test gap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## True/False questions\n",
    "\n",
    "1. With `CountVectorizer`, we should `fit` on the training data and `transform` on both the train/test data.\n",
    "2. `predict` returns the positive class if the predicted probability of the positive class is greater than 0.5.\n",
    "3. Logistic regression overfits less than decision trees.\n",
    "4. With logistic regression, we learn one weight per training example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
